Skip to main content

Data Syncer

TIR's Data Syncer is a service that seamlessly transfers data from your data source to TIR Datasets, ensuring efficient and reliable data synchronization. You define a source (where data lives), a destination (which dataset path receives it), and a connection that ties them together with a sync mode and schedule. Run syncs on demand or on a recurring cadence so training and inference always read fresh data.

You can only transfer files up to 50GB in size. If a file exceeds this limit, the transfer for that particular file will be skipped.


Multi-Source ConnectorsIncremental & Full SyncScheduled, Cron & ManualEOS Dataset Destinations

Quick start

Explore Data Syncer


Console workflow

  1. Open TIR and select a project.
  2. In the sidebar, open Data Syncer. You will see Sources, Destinations, and Connections.
  3. Create and save a source, a destination, and then a connection that links them. All three are required before a sync can run.

Sources

A source is a configured connector to an external system (object storage, Drive, etc.). TIR reads objects from that source when a connection runs.

Supported sources

SourceAuthenticationIncremental sync
AWS S3IAM userYes
Azure BlobConnection stringYes
Google DriveOAuthYes
Google Cloud StorageService accountYes
MinIOAccess & secret keysYes

Creating a source

  1. In Data Syncer, open Sources and choose Create Source.
  2. Pick a connector type and enter the required credentials and settings for your environment.

Step-by-step options for each connector are in Configuring Sources.


After you save, the source can be reused by one or more connections.

Updating a source

Open the source from the list and use Edit to change credentials or settings. Updates apply to future sync jobs; running jobs are not rolled back automatically.


Destinations

A destination maps sync output to an EOS-backed TIR Dataset and a path inside that dataset. You can select an existing dataset or create one first, then point the destination at the folder prefix where synced objects should land.

Creating a destination

  1. In Data Syncer, open Destinations and choose Create Destination.
  2. Select the dataset and the destination path (prefix) for incoming files.
  3. Confirm to create the destination.

tip

Choose your destination path within the Dataset carefully. Any conflicts with the existing dataset files could result in data loss.

Updating a destination

Use Edit on a destination to change the target path or dataset selection. Validate downstream jobs or pipelines that depend on the old path before switching.


Connections

A connection binds one source to one destination and controls how and when data is copied. This is where you set sync mode, schedule, and optional manual runs.

Sync modes

A Sync Mode governs how TIR will read from a source and write to a destination. TIR supports two types of Sync Modes:

ModeBehavior
Full refreshReads the full source scope and writes to the destination, replacing content according to the connector semantics for that run.
IncrementalCopies objects added or changed since the last successful sync, when the source supports it.

Schedule modes

Schedule mode defines the frequency of the data sync, i.e., how often the data from your source will sync to the destination. The options are as follows:

ModeBehavior
ScheduledRuns at a regular interval you configure in the UI.
CronRuns on a cron schedule. See Cron expressions for syntax and examples.
ManualNo automatic runs; you trigger syncs with Sync Now.

You can still use Sync Now on any connection when you need an extra run, regardless of schedule mode.

Creating a connection

  1. In Data Syncer, open Connections and choose Create Connection.
  2. Select the source and destination to link.
  3. Choose Sync mode and Schedule mode, then create the connection.

After creation, use the Overview tab for connection details and the Jobs tab for history.

Enable or disable a connection

Data Sync Jobs for any connection can be stopped temporarily by disabling the connection using the Toggle button in the Actions Column. Disabling the connection will stop all the scheduled sync jobs until the connection is enabled but does not impact any running sync job.

The connection can be enabled again using the Toggle button, and all the jobs, if any, will run as scheduled.

Updating a connection

Use Edit to change sync mode, schedule, or the linked source or destination. Review active jobs after changes so you understand what the next runs will do.


Jobs and lifecycle

Cancel running jobs

From the Jobs view for a connection, you can cancel an in-progress sync. Cancellation stops further work for that job; files already written remain—nothing is rolled back automatically.



What’s next