Create your own action#

Introduction#

In this guide, we will create a simple Roboto Action that tags a dataset if a specific keyword (“error”) is found in a log file.

While this is a basic example, it can be expanded for more advanced log post-processing and tagging. For instance, you could calculate metrics based on log contents and automatically tag a dataset if certain values exceed predefined thresholds.

Here’s what we’ll cover:

  1. Initializing a new action using the Roboto CLI

  2. Building a Docker image that contains a Python script to handle dataset tagging

  3. Creating a new action using the Roboto CLI

  4. Manually invoking the action on a dataset to test its functionality

  5. Setting up a trigger to automatically invoke the action when a matching file is uploaded

You can find the complete code for this example on GitHub: roboto-example-action.

Prerequisites#

Initialize a new action#

Use the Roboto CLI to initialize a new action project in a local directory:

roboto actions init

This will present you with a few configuration options:

  • Select Roboto Python Action (Option 2)

  • Give your project a name: tag_dataset

  • initialize_git_repo: y

After initialization, a new project will be created with the following directory structure:

tag_dataset/
├── Dockerfile
├── README.md
├── action.json
├── requirements.dev.txt
├── requirements.runtime.txt
├── scripts
│   ├── build.sh
│   ├── deploy.sh
│   ├── run.sh
│   └── setup.sh
└── src
    └── tag_dataset
        ├── __init__.py
        └── __main__.py

The key components of the project include:

  • action.json: Defines the action, including its description and parameters.

  • Dockerfile: Contains the configuration for the action’s Docker image.

  • scripts: Utility scripts for building and deploying the action.

  • src: Contains the Python code specific to the action.

Update the action’s code#

Replace the code in src/tag_dataset/__main__.py with:

import logging
from roboto import ActionRuntime

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

if __name__ == "__main__":
    runtime = ActionRuntime.from_env()
    dataset = runtime.dataset

    log_path = runtime.input_dir / "log.txt"
    keyword = "error"

    with open(log_path, "r") as log_file:
        if keyword in log_file.read():
            dataset.put_tags([keyword])
            logger.info(f"Found '{keyword}' in log file.")
        else:
            logger.info(f"'{keyword}' not found in log file.")

This is how it works

This script initializes a utility class for reading from and interacting with roboto.ActionRuntime. This makes it easy to access a reference to the dataset via the roboto.ActionRuntime.dataset property. The script scans an input log.txt file and calls dataset.put_tags() if it finds the word error in the log.

Build a Docker image#

In the top-level directory with the Dockerfile, run the following commands:

./scripts/setup.sh
./scripts/build.sh

This will set up a local dev environment, install dependencies, and build a Docker image called tag_dataset with the tag latest.

Deploy the action#

Deploy the tag_dataset:latest image to Roboto and create an action that uses it:

./scripts/deploy.sh

Note

If you have your own image, you can push it to the registry explicitly via: roboto images push <my-image:latest>

Create a log file and dataset#

To test the action we just made, we can create a dummy log file. Run this command in your terminal to create a log.txt file:

( for i in {1..10}; do echo "$(date '+%Y-%m-%d %H:%M:%S') - INFO - Log message $i"; done; echo "$(date '+%Y-%m-%d %H:%M:%S') - ERROR - An error occurred"; for i in {11..20}; do echo "$(date '+%Y-%m-%d %H:%M:%S') - INFO - Log message $i"; done; ) > log.txt

Now create a dataset:

roboto datasets create

You will get an output like this:

{
   ...
   "dataset_id": "ds_bopf33kzwisr",
   ...
}

Copy the dataset_id and use it to upload the log.txt file:

roboto datasets upload-files -d <dataset_id> -p ./log.txt

You can verify that the file was uploaded by running:

roboto datasets list-files -d <dataset_id>

Invoke the action#

Next, we’ll manually invoke the action on the dataset we created above:

roboto actions invoke tag_dataset --dataset-id <dataset_id> --input-data "log.txt"

You will get an invocation_id as an output. You can use this to check the status of the invocation:

roboto invocations status --tail <invocation_id>

Once the invocation is complete, you can check the dataset to see if the tag was added:

roboto datasets show -d <dataset_id>

You should see the following output:

{
    ...
    "tags": [
        "error"
    ],
   ...
}

You can also go to your account on Roboto to inspect the dataset and see the new tag:

Tag visible on dataset

Create a trigger (optional)#

Tip

Triggers enable you to run an action automatically when a new file is uploaded to a dataset and meets certain conditions.

In this example, we will setup a trigger that automatically runs our action when a file with the name log.txt is uploaded to a new dataset.

roboto triggers create --name tag_dataset_trigger --action tag_dataset --required-inputs 'log.txt' --for-each dataset_file

To test it out, create a new dataset and upload a log.txt file to it.

Conclusion#

And that’s it! You’ve now created your own custom Roboto Action, complete with automated tagging for datasets. With this setup, you can explore the full flexibility of Roboto Actions—whether for processing, transforming, or analyzing your data.

Remember, Roboto Actions are general-purpose and can handle a wide range of tasks. You can refer to the Python SDK reference for more details on available commands and explore ways to manipulate datasets, files, topics, and more directly within your action. This flexibility opens up powerful workflows for automating and scaling your data operations.