Scheduling

The Scheduling Protocol #

You can schedule jobs through the Scheduling Protocol. This is fundamentally based around the /schedule endpoint (TODO : link to API docs).

Discovering job specifications #

In order to schedule a job you need its job specification. If your jobs are defined in your own app, you are all set. However, if they are defined in different apps, you’ll either have to refer to them by identifier or grab information about them dynamically through the Management Protocol. The officially supported SDKs include tooling to generate stubs for calling remote jobs.

Scheduling individual jobs #

The easiest way to schedule jobs is individually. You can post to the schedule endpoint with the name of a job spec you are interested in running along with a series of optional configuration, like parameters and target queue.

You’ll receive a confirmation that the scheduling succeeded and the job ID which can be used to check for updates and results if desired. You can also specify callback endpoints if you want Rota to push status updates or results to you instead of having to poll for them.

To reduce round trips, you can specify more than one job in a single request.

Stream batching #

If you have a lot of data points, it can be inefficient to process them in a single task. When this happens, it is valuable to split the large data set into a series of smaller batches of data. Batching can be done manually by the caller, but this is not always efficient: for instance, loading the entire data set into memory to form the request may be slow or impractical.

To improve this, you can specify initial job parameters and batch size, then submit the data set in chunks. By streaming the data set to the Rota and letting it split it into batches for you, you can reduce the load on your own application while simplifying it as well.

File batching #

Stream batching is fine, but that means loading data into your application and sending it to Rota. In many cases, you can tell Rota to load that data itself, avoiding more round trips and reducing load on your application. For instance, if you have a CSV file in an S3-compatible object store, you can direct Rota to load it and spawn jobs for each row or for batches of rows.

When using file batching, you may specify a single file or a directory which can (at your option) be searched recursively.

Supported data sources #

  • HTTP
  • S3 (and compatible)
  • SFTP
  • WebDAV

Supported data formats #

  • Arrow
  • Avro
  • CSV
  • JSON
  • MessagePack
  • NDJSON
  • ODS
  • Parquet
  • Raw (do not parse files, pass them directly to jobs)
  • TOML
  • TSV
  • XLS(X)
  • XML
  • YAML

Workflows #

Scheduling an individual job is often not enough. Frequently, you need to schedule multiple jobs that feed into each other. Rota calls these workflows: chains of jobs that feed into each other.

Workflows are Directed Acyclic Graphs (DAGs). You list a series of jobs, their inputs, and outputs. Inputs may either be fixed values specified at schedule time or may be the outputs of other jobs. If a job depends on another job’s output, Rota automatically detects the dependency relationship and orders job execution accordingly, attempting to execute as many jobs in parallel as permitted.

Timeouts and fallbacks #

When scheduling workflows, there may be jobs that you expect might take too long. You can specify timeouts and rules that occur on those timeouts. You may specify that on timeout the entire workflow fails, that only dependent jobs fail, or to run dependent jobs with a set of default values for the outputs provided at schedule time.