Create your own action#
Introduction#
In this guide, we will create a simple Roboto Action that tags a dataset if a specific keyword (“error”) is found in a log file.
While this is a basic example, it can be expanded for more advanced log post-processing and tagging. For instance, you could calculate metrics based on log contents and automatically tag a dataset if certain values exceed predefined thresholds.
Here’s what we’ll cover:
Initializing a new action using the Roboto CLI
Building a Docker image that contains a Python script to handle dataset tagging
Creating a new action using the Roboto CLI
Manually invoking the action on a dataset to test its functionality
Setting up a trigger to automatically invoke the action when a matching file is uploaded
You can find the complete code for this example on GitHub: roboto-example-action.
Prerequisites#
Ensure you have a Roboto Account.
Set up Programmatic Access to generate an access token and install the Roboto CLI.
Install Docker and have it running on your computer.
Initialize a new action#
Use the Roboto CLI to initialize a new action project in a local directory:
roboto actions init
This will present you with a few configuration options:
Select
Roboto Python Action
(Option 2)Give your project a name:
tag_dataset
initialize_git_repo:
y
After initialization, a new project will be created with the following directory structure:
tag_dataset/
├── Dockerfile
├── README.md
├── action.json
├── requirements.dev.txt
├── requirements.runtime.txt
├── scripts
│ ├── build.sh
│ ├── deploy.sh
│ ├── run.sh
│ └── setup.sh
└── src
└── tag_dataset
├── __init__.py
└── __main__.py
The key components of the project include:
action.json
: Defines the action, including its description and parameters.
Dockerfile
: Contains the configuration for the action’s Docker image.
scripts
: Utility scripts for building and deploying the action.
src
: Contains the Python code specific to the action.
Update the action’s code#
Replace the code in src/tag_dataset/__main__.py
with:
import logging
from roboto import ActionRuntime
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
if __name__ == "__main__":
runtime = ActionRuntime.from_env()
dataset = runtime.dataset
log_path = runtime.input_dir / "log.txt"
keyword = "error"
with open(log_path, "r") as log_file:
if keyword in log_file.read():
dataset.put_tags([keyword])
logger.info(f"Found '{keyword}' in log file.")
else:
logger.info(f"'{keyword}' not found in log file.")
This is how it works
This script initializes a utility class for reading from and interacting with
roboto.ActionRuntime
. This makes it easy to access a reference to the dataset via theroboto.ActionRuntime.dataset
property. The script scans an inputlog.txt
file and callsdataset.put_tags()
if it finds the worderror
in the log.
Build a Docker image#
In the top-level directory with the Dockerfile
, run the following commands:
./scripts/setup.sh
./scripts/build.sh
This will set up a local dev environment, install dependencies, and build a Docker image called tag_dataset
with the tag latest
.
Deploy the action#
Deploy the tag_dataset:latest
image to Roboto and create an action that uses it:
./scripts/deploy.sh
Note
If you have your own image, you can push it to the registry explicitly via: roboto images push <my-image:latest>
Create a log file and dataset#
To test the action we just made, we can create a dummy log file. Run this command in your terminal to create a log.txt
file:
( for i in {1..10}; do echo "$(date '+%Y-%m-%d %H:%M:%S') - INFO - Log message $i"; done; echo "$(date '+%Y-%m-%d %H:%M:%S') - ERROR - An error occurred"; for i in {11..20}; do echo "$(date '+%Y-%m-%d %H:%M:%S') - INFO - Log message $i"; done; ) > log.txt
Now create a dataset:
roboto datasets create
You will get an output like this:
{
...
"dataset_id": "ds_bopf33kzwisr",
...
}
Copy the dataset_id
and use it to upload the log.txt
file:
roboto datasets upload-files -d <dataset_id> -p ./log.txt
You can verify that the file was uploaded by running:
roboto datasets list-files -d <dataset_id>
Invoke the action#
Next, we’ll manually invoke the action on the dataset we created above:
roboto actions invoke tag_dataset --dataset-id <dataset_id> --input-data "log.txt"
You will get an invocation_id
as an output. You can use this to check the status of the invocation:
roboto invocations status --tail <invocation_id>
Once the invocation is complete, you can check the dataset to see if the tag was added:
roboto datasets show -d <dataset_id>
You should see the following output:
{
...
"tags": [
"error"
],
...
}
You can also go to your account on Roboto to inspect the dataset and see the new tag:

Create a trigger (optional)#
Tip
Triggers enable you to run an action automatically when a new file is uploaded to a dataset and meets certain conditions.
In this example, we will setup a trigger that automatically runs our action when a file with the name log.txt
is uploaded to a new dataset.
roboto triggers create --name tag_dataset_trigger --action tag_dataset --required-inputs 'log.txt' --for-each dataset_file
To test it out, create a new dataset and upload a log.txt
file to it.
Conclusion#
And that’s it! You’ve now created your own custom Roboto Action, complete with automated tagging for datasets. With this setup, you can explore the full flexibility of Roboto Actions—whether for processing, transforming, or analyzing your data.
Remember, Roboto Actions are general-purpose and can handle a wide range of tasks. You can refer to the Python SDK reference for more details on available commands and explore ways to manipulate datasets, files, topics, and more directly within your action. This flexibility opens up powerful workflows for automating and scaling your data operations.