- Print
- DarkLight
- PDF
Batch Processor Documentation
- Print
- DarkLight
- PDF
Purpose
Provide a means to process a file of many address records through the LightBox goecoder at the same time. Batch processing is significantly more efficient due to the dramatic reduction in network latency caused by individual geocoding requests. Our recommendation is to use Batch Processing for workloads outside of interactive geocoding or autocomplete use cases. Multiple files can be uploaded and processed simultaneously. Each file or job can have a unique or consistent configuration of the data attributes to retrieved. Batch processes are secured as they are only accessible to the user creating the uniquely identified job, each job has a private job identifier. Batch jobs can be started, canceled and checked for the current process status including an estimated time to complete
Features
Obtain a secure upload and download link for the CSV files
A secure job token, used to monitor the job with secure access to the job’s files
The ability to start or cancel a job
Supports very large files > 5GB
Flexible field output
Job processing status with estimated time to complete
The process involves:
Obtaining a secure and unique upload URL and a job token to a secure cloud storage
Uploading the CSV file to a specific job token
Start the geocoding process
Use the job token to monitor the job
From the job token, retrieve a secure download link to the completed CSV file
Download the CSV file
Requirements
The LightBox APIs are hosted in the cloud and therefore have no platform requirements. Application requirements include:- A network connection to the LightBox API server
- Ability to parse JavaScript Object Notation (JSON) API responses
- Secure HTTPS connection
- LightBox authentication key
- LightBox authentication key
Connecting your account
When your LightBox user account is created, a unique API key is also generated. The API key should be kept secret at all times and can only be used for API requests. The key is required in all API calls.
To retrieve your unique API key:
- Log in to the LightBox Developer Portal
- Select Apps from the menu bar
- In your approved App, note your API key (under Consumer Key)
Performing API requests
All API requests must be made over secure HTTPS connections. Requests made over HTTP will fail.
The base URL of the API server that all API requests will be made to is: https://api.lightboxre.com/ followed by a version number https://api.lightboxre.com/v1
Authentication
LightBox APIs uses a token-based authentication. All requests to the LightBox APIs must be authenticated. The token to be passed via an HTTP header with key 'x-api-key' and value <Your authentication token>
Pass your unique API key in the authorization header of every LightBox API call. LightBox uses this information to authenticate your identify and determine whether you have sufficient permissions to complete the operation. curl -X GET -H ‘x-api-key: (api_key)’ https://api.lightboxre.com/
API Requests
File Upload < 5GB
Use this endpoint to retrieve an upload URL, to be used to upload your CSV file to a secure location.
POST /batch/files/upload
Example requests
curl -d '{"fileName": "file.csv"}' -H "Content-Type: application/json" -X POST http://api.lightboxre.com/v1/batch/files/upload
Request Body
{
"fileName": "file.csv"
}
Field | Description |
---|---|
fileName | Name of your CSV file |
Response
Media type: application/json
{
"$ref": "string",
"signedUrl": {
"expiry": "60 minutes",
"token": "3b4ca2ae-4d64-46a3-b08b-2d088682f78c",
"url": "https://foo.amazonaws.com/uat/files/5555555-4d64-46a3-b08b-2d088682f78c/filename.csv?X-Amz-Algorithm=AWS4-5555555SHA256&X-Amz-Credential=555555555PFKBAAUEOSV3%2F20240731%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240731T181736Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&x-id=PutObject&X-Amz-Signature=55555555555c8f7d61848317ba3a80863eb3d2310c74467edad295928"
}
Field | Description |
---|---|
$ref | URL reference back to this call |
signedUrl.expiry | Expire date of this signed URL. After the expire time this URL becomes invalid. |
signedUrl.token | Job token. This token is used to reference this process or job to get job status, clean up files, cancel a job or retrieve a download link. |
signedUrl.url | Pre signed URL |
File Upload > 5GB
Use this endpoint to retrieve a multipart file upload URL for files that are greater than 5GB in size, which is used to upload large CSV files to a secure location. The total set of files is considered for a single batch processing job. Each file must be a well-formed CSV file.
POST /batch/files/initiate-upload
Example requests
curl -d '{"fileName": "file.csv", "numberOfParts": 5}' -H "Content-Type: application/json" -X POST http://api.lightboxre.com/v1/batch/files/initiate-upload
Request Body
{
"fileName": "file.csv",
"numberOfParts": 5
}
Field | Description |
---|---|
fileName | Name of your CSV file |
numberOfParts | Number of parts that you want to upload. When your CSV file is larger than 5GB the files must be broken into multiple parts. |
Response
Media type: application/json
{
"$ref": "string",
"signedUrl": {
"expiry": "60 minutes",
"token": "3b4ca2ae-4d64-46a3-b08b-2d088682f78c",
"urls": [
"https://foo.amazonaws.com/uat/files/5555555-4d64-46a3-b08b-2d088682f78c/filename.csv?X-Amz-Algorithm=AWS4-5555555SHA256&X-Amz-Credential=555555555PFKBAAUEOSV3%2F20240731%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240731T181736Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&x-id=PutObject&X-Amz-Signature=55555555555c8f7d61848317ba3a80863eb3d2310c74467edad295928"
],
"uploadId": "ef.KCNxfpJwo1ZUaDgUsN9W4aiYScrjmv1xNOYpSYaLmgh_UDguAo51N8FeNK4LS95NPZpEVoGtjaF0DMFS7u1goMFDkai3tDaCjUpsmBfRmSjRXSIXRwSB7jaxqHNef"
}
}
Field | Description |
---|---|
$ref | URL reference back to this call |
signedUrl.expiry | Expire date of this signed URL. After the expire time this URL becomes invalid. |
signedUrl.token | Job token. This token is used to reference this process or job to get job status, clean up files, cancel a job or retrieve a download link. |
signedUrl.urls | Collection of pre-signed URLs, one for each part |
signedUrl.uploadId | Upload ID to denote the group of files to be uploaded. This property is used in the complete upload call telling the system that you have completed all parts of the upload. |
Complete Multi-part upload for files > 5GB
This call is made once all the parts of a large file has been uploaded.
POST /batch/files/complete-upload
Example requests
curl -d '{"parts": [{"partnumber": 1, "eTag":"555"}],"token": "555", "uploadID": "555"}' -H "Content-Type: application/json" -X POST http://api.lightboxre.com/v1/batch/files/complete-upload
Request Body
{
"parts": [
{
"partNumber": 2,
"eTag": "KCNxfpJwo1ZUaDgUsN9W4aiYScrjmv1xNOYpSYa"
}
],
"token": "3b4ca555-4d64-46a3-b08b-2d088682f78c",
"uploadID": "ef.555o1ZUaDgUsN9W4aiYScrjmv1xNOYpSYaLmgh_UDguAo51N8FeNK4LS95NPZpEVoGtjaF0DMFS7u1goMFDkai3tDaCjUpsmBfRmSjRXSIXRwSB7jaxqHNef"
}
Field | Description |
---|---|
parts | Collection of file parts |
parts[0].partNumber | Denotes a part |
parts[0].eTag | As you use the signed URL to upload a multipart file, the eTag will be returned. This value must be sent back in with the part. |
token | Job/Process token for this batch process |
uploadID | uploadID is provided when calling /batch/files/initiate-upload |
Response
Media type: 204 No Content
Download URL
This endpoint will provide a download URL for the completed file.
GET /batch/files/download/{token}
Example requests
curl -X GET -H ‘x-api-key: (api_key)’ https://api.lightboxre.com/v1/batch/files/download/3b4ca555-4d64-46a3-b08b-2d088682f78c
https://api.lightboxre.com/v1/batch/files/download/3b4ca555-4d64-46a3-b08b-2d088682f78c
Parameters
Parameter | Type | Description | Usage |
---|---|---|---|
token | path | Job/Process token | required |
Response
Media type: application/json
{
"$ref": "string",
"signedUrl": {
"expiry": "60 minutes",
"token": "3b4ca2ae-4d64-46a3-b08b-2d088682f78c",
"url": "https://foo.amazonaws.com/uat/files/5555555-4d64-46a3-b08b-2d088682f78c/filename.csv?X-Amz-Algorithm=AWS4-5555555SHA256&X-Amz-Credential=555555555PFKBAAUEOSV3%2F20240731%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240731T181736Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&x-id=PutObject&X-Amz-Signature=55555555555c8f7d61848317ba3a80863eb3d2310c74467edad295928"
}
Field | Description |
---|---|
$ref | URL reference back to this call |
signedUrl.expiry | Expire date of this signed URL. After the expire time this URL becomes invalid. |
signedUrl.token | Job token. This token is used to reference this process or job to get job status, clean up files, cancel a job, or retrieve a download link. |
signedUrl.url | Pre signed URL |
Clean up files
This endpoint provides the means to remove data related to a job from the system.
DELETE /batch/files/{token}
Allows direct removal of the files, input, and output associated with the Batch Processing Job.
Example requests
curl -X DELETE -H ‘x-api-key: (api_key)’ https://api.lightboxre.com/v1/batch/files/3b4ca555-4d64-46a3-b08b-2d088682f78c
https://api.lightboxre.com/v1/batch/files/3b4ca555-4d64-46a3-b08b-2d088682f78c
Parameters
Parameter | Type | Description | Usage |
---|---|---|---|
token | path | Job/Process token | required |
Response
HTTP Code 202, Accepted, No Content returned, This status is returned if the job is currently running. The system will first cancel the job then remove the files.
HTTP Code 204, Successful, No Content returned
Start a Job
Once the data has been uploaded and you have a job token, this endpoint will add the job to the queue for processing.
POST /batch/process/token
Example requests
curl -d '{see below}' -H "Content-Type: application/json" -X POST http://api.lightboxre.com/v1/batch/process/{token}
Parameters
Parameter | Type | Description | Usage |
---|---|---|---|
token | path | Job/Process token | required |
Request Body
{
"apis": [
{
"name": "geocoding-engine",
"url": "/v1/search?text={1}+{2}+{3}+{4}",
"outputColumns": [
{
"source": "features.0.geometry.coordinates.0",
"target": "latitude"
}
]
}
]
}
Field | Description |
---|---|
apis | collection of APIs to be used in the process. Currently, we support a single API |
apis[0].name | Name of the API to be used. This is not used by the processor but allows you to name the API in your JSON object for clarity |
apis[0].url | URL to this API, note the {1}, {2} etc. These are column references for where the address columns are in the CSV. In this example column 1 holds the address, column 2 holds the city, column 3 holds the state, column 4 holds the zip code. |
apis[0].outputColumns | Collection of columns mapped from the geocode response to your output file. |
apis[0].outputColumns[0].source | Source column, output from the geocode response. In the example above you will see features.0. features is an array and 0 is the index for this array. |
apis[0].outputColumns[0].target | Target field name. This field will be added to the output CSV file with this name and the value from the source mapping. |
Response
Media type: application/json
{
"$ref": "string",
"status": {
"fileName": "file.csv",
"jobDescription": {
"apis": [
{
"name": "geocoding-engine",
"url": "/v1/search?text={1}+{2}+{3}+{4}",
"outputColumns": [
{
"source": "features.0.geometry.coordinates.0",
"target": "latitude"
}
]
}
],
"rowCount": 5003
},
"performance": {
"averageSpeed": 3.6,
"currentSpeed": 3.6,
"etc": "01 23:59:59",
"processedPercentage": 0.5,
"processedRecords": 3456,
"startedAt": "2024-07-24T10:23:54.430884195-04:00",
"updatedAt": "2024-07-24T10:23:54.430884195-04:00",
"totalRecords": 3456
},
"status": "STARTING"
}
}
Field | Description |
---|---|
$ref | URL reference back to this call |
status.fileName | Name of the input file |
status.jobDescription | Description object that was passed in /batch/process/{token} |
status.performance | Performance metrics on the current process. This information can help provide signal as to how fast the process is running and when it might be complete. |
status.status | What status the current process is in. Possible values are:
|
Return status on a Job/Process
This endpoint will provide status on a job or process.
GET /batch/process/{token}
Example requests
curl -X GET -H ‘x-api-key: (api_key)’ https://api.lightboxre.com/v1/batch/process/3b4ca555-4d64-46a3-b08b-2d088682f78c
https://api.lightboxre.com/v1/batch/process/3b4ca555-4d64-46a3-b08b-2d088682f78c
Parameters
Parameter | Type | Description | Usage |
---|---|---|---|
token | path | Job/Process token | required |
Response
Media type: application/json
{
"$ref": "string",
"status": {
"fileName": "file.csv",
"jobDescription": {
"apis": [
{
"name": "geocoding-engine",
"url": "/v1/search?text={1}+{2}+{3}+{4}",
"outputColumns": [
{
"source": "features.0.geometry.coordinates.0",
"target": "latitude"
}
]
}
],
"rowCount": 5003
},
"performance": {
"averageSpeed": 3.6,
"currentSpeed": 3.6,
"etc": "01 23:59:59",
"processedPercentage": 0.5,
"processedRecords": 3456,
"startedAt": "2024-07-24T10:23:54.430884195-04:00",
"updatedAt": "2024-07-24T10:23:54.430884195-04:00",
"totalRecords": 3456
},
"status": "STARTING"
}
}
Field | Description |
---|---|
$ref | URL reference back to this call |
status.fileName | Name of the input file |
status.jobDescription | Description object that was passed in /batch/process/{token} |
status.performance | Performance metrics on the current process. This information can help provide signal as to how fast the process is running and when it might be complete. |
status.status | What status the current process is in. Possible values are:
|
Cancel a Job/Process
This endpoint provides you with the means to cancel a job/process.
DELETE /batch/process/{token}
Example requests
curl -X DELETE -H ‘x-api-key: (api_key)’ https://api.lightboxre.com/v1/batch/process/3b4ca555-4d64-46a3-b08b-2d088682f78c
https://api.lightboxre.com/v1/batch/process/3b4ca555-4d64-46a3-b08b-2d088682f78c
Parameters
Parameter | Type | Description | Usage |
---|---|---|---|
token | path | Job/Process token | required |
Response
HTTP Code 202, Accepted, No Content returned
Geocoding Mapping Options (Source Output Columns)
This table outlines the mapping options from source to target and what each field represents. We recommend carrying the fields marked with a '*' in the Best Practice column as these fields are the most important.
geocoding.query.text | AddressInput | Address input text | |
geocoding.query.parsedText.streetNumber | ParsedStreetNumber | Street number as parsed from the input text | |
geocoding.query.parsedText.streetName | ParsedStreetName | Street name as parsed from the input text | |
geocoding.query.parsedText.locality | ParsedCity | City name as parsed from the input text | |
geocoding.query.parsedText.region | ParsedState | State as parsed from the input text | |
geocoding.query.parsedText.postalCode | ParsedZipCode | Zip code as parsed from the input text | |
features.0.geometry.coordinates.0 | * | Latitude | Latitude portion of the location of the address |
features.0.geometry.coordinates.1 | * | Longitude | Longitude portion of the location of the address |
features.0.properties.label | * | AddressLabel | Matched address label |
features.0.properties.streetNumber | * | MatchedSteetNumber | Matched address street number |
features.0.properties.streetName | * | MatchedStreetName | Matched address street name |
features.0.properties.postalCode | * | MatchedZipCode | Matched address zip code |
features.0.properties.localAdmin | MatchedLocalAdmin | Matched address local admin | |
features.0.properties.locality | * | MatchedCity | Matched address city name |
features.0.properites.county | * | MatchedCounty | Matched address county name |
features.0.properties.regionCode | * | MatchedState | Matched address state |
features.0.properties.countryCode | MatchedCountry | Matched address country | |
features.0.properties.continent | MatchedContinent | Matched address continent | |
features.0.properties.score | * | GeocodeScore | Overall address matched score. |
features.0.properties.scoreComponents.streetNumber | GeocodeStreetNumberScore | Street number score | |
features.0.properties.scoreComponents.streetName | GeocodeStreetNameScore | Street name score | |
features.0.properties.scoreComponents.locality | GeocodeCityScore | Street city score | |
features.0.properties.scoreComponents.postalCode | GeocodeZipCodeScore | Street zip code score | |
features.0.properties.changeFlag.streetType | ChangeFlagStreetType | A flag that denotes a change from the input value to the matched value, possible values are (Fuzzy, Added, Changed, Removed, Alias) | |
features.0.properties.changeFlag.streetName | ChangeFlagStreetName | A flag that denotes a change from the input value to the matched value, possible values are (Fuzzy, Added, Changed, Removed, Alias) | |
features.0.properties.changeFlag.locality | ChangeFlagCity | A flag that denotes a change from the input value to the matched value, possible values are (Fuzzy, Added, Changed, Removed, Alias) | |
features.0.properties.changeFlag.postalCode | ChangeFlagZipCode | A flag that denotes a change from the input value to the matched value, possible values are (Fuzzy, Added, Changed, Removed, Alias) | |
features.0.properties.attributes.fipsCode | FIPSCode | County FIPS Code | |
features.0.properties.attributes.address_lid | * | ADDRESS_LID | Unique LightBox ID for this address |
features.0.properties.attributes.parcel_lid.0 | * | PARCEL_LID | Related parcel record to this address |
features.0.properties.attributes.assessment_lid.0 | * | ASSESSMENT_LID | Related assessment record to this address |
features.0.properties.attributes.building_lid.0 | * | STRUCTURE_LID | Related structure record to this address |
features.0.properties.attributes.precisionCode | * | PrecisionCode | See Geocode Precision Code for details |
HTTP Error Codes
200 | The request succeeded. |
201 | The object was created successfully |
202 | Accepted, no content |
204 | Successful, no content |
204 | The server has successfully fulfilled the request and that there is no additional content to send in the response payload body. Typically returned on a DELETE |
400 | One or more of the request parameters were invalid. |
401 | The client must authenticate itself to get the requested response. Note: This could also be due to your trial key has expired. |
404 | The server cannot find the requested resource. This can also mean that the endpoint is valid but the resource itself does not exist. |
429 | Too many requests were made in a short period of time, or you have exceeded your request-lot pool. |
500 | The server has encountered an error it does not know how to handle. |
503 | Service Unavailable. |