Separated payloads, optionally with lifetime

For instance in processes dealing with sensitive or personal data, there may be requirement to remove the sensitive data periodically.

TOS provides functionality to separate payloads to their own collection, with set lifetime based on updatedAt-field. The solution is based on MongoDB TTL indexes, which allow automatic removal of documents from collection after set lifetime.

Note that it is possible to just separate the payloads to their own collection, without setting the TTL-index.

Setup

If separate_payloads is supplied during TOS initialization, payloads will be separated to their own collection.

If payloads_ttl (for TOSLibrary) or payloads_ttl_seconds (for tos) is also supplied, the required index will be automatically created if it does not already exist. Note that TOSLibrary accepts lifetime as seconds or as a timestring, while bare tos accepts the value only as seconds.

For initialization reference with Robot Framework:

*** Settings ***
Library  TOSLibrary  ${db_server}:${db_port}  ${db_name}  ${db_user}  ${db_passw}
...  separate_payloads=${TRUE}  payloads_ttl=30 days

Or with pure Python:

tos = TaskObjectStorage(
        db_server="localhost:27017",
        db_name="testing",
        db_auth_source="admin",
        db_user="robo-user",
        db_passw="secret-word",
)
tos.initialize_tos(separate_payloads=True, payloads_ttl_seconds=50000)

Lifetime cannot be set without separating payloads to their own collection. Providing just the lifetime value argument without argument for separating payloads will cause error.

The structure

After creation of task object after initialization, the structure of the single task object and its associated payload is the following.

The task object contains reference to the payload document:

{
  "_id": ObjectId("5c519c08cd9c9f140f95b427"),
  ...
  "payload": {
      "_id": ObjectId("60db0d52d30efa2804f80a8c")
  }
  ...
}

The payload-document contains the payload object, as well as id, creation and update timestamps:

{
  "_id": ObjectId("60db0d52d30efa2804f80a8c")},
  "payload": {
      "this_is": "the true payload!",
      "with": "many fields!"
  },
  updatedAt: Date(*Timestamp of last update*),
  createdAt: Date(*Timestamp of creation*)
}

As is illustrated, payload document does not contain reference to the parent.

Usage

Separated payloads

When TTL payloads are used, TOS can be used as with regular payloads, with the package abstracting away the separation of payloads for some methods. Make sure whether the used method returns task object with a merged payload, or task object with “bare” payload (containing only the reference to the separate payload document).

As a rule of thumb:

  • Creator returns merged task object
  • Majority of setter and update methods that target payload will return merged task object
  • Update and setter methods which do not modify payload will return bare task object
  • Finder methods will return merged task object, barring methods which find specific task object

These rules are of not concern when using RPALibrary, which will return merged payload to the user as a default.

Warning

When migrating existing solution to use separated payloads, make sure to particularly check for any existing finder-method calls. These calls will break during migration if payload contents from the calls are used anywhere, since after migration they will return bare task object with payload containg just the reference. To fix the issue, replace calls with new finder methods that return merged task object.

Dealing with lifetime

After supplying the payloads_ttl_seconds once, the TTL index is enforced on every initialization of TOS. If there is later need to adjust the TTL value, refer to MongoDB docs for the corresponding operations for modifying the existing index, manually run required commands, and on next initialization of TOS supply it with the new value. Attempts to initialize TOS with different TTL value without modifying the existing index should fail.

The payload is indexed based on updatedAt-field, which is set on task object creation, and is automatically updated (incremented) when using corresponding methods of the package to modify payload. When the set TTL time given in initialization is reached between the current time and updatedAt-field datetime, MongoDB will automatically remove the payload document. Note that the deletion is not guaranteed to occur immediately.

Warning

Make sure payloads live long enough for the whole process to run. Attempting to process task object with an expired payload will raise exception.