Data Factory Integration Runtime
  • 19 Nov 2024
  • 5 Minutes to read
  • Dark
    Light
  • PDF

Data Factory Integration Runtime

  • Dark
    Light
  • PDF

Article summary

Integration Runtime

An Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide data integration capabilities such as Data Flows and Data Movement. It acts as a link between the activity and the linked Services.

The following data integration capabilities are provided by Integration Runtime across various network environments:

Data Flow: Allows users to execute a Data Flow in managed Azure compute environment.

Data movement: Allows users to copy data between public network data stores and private network data stores (on-premises or virtual private network). It supports built-in connectors, format conversion, column mapping, and fast and scalable data transfer.

Activity dispatch: Allows users to dispatch and monitor transformation activities running on a variety of compute services, including Azure Databricks, Azure HDInsight, ML Studio (classic), Azure SQL Database, SQL Server, and others.

SSIS package execution: SQL Server Integration Services (SSIS) packages can be executed natively in a managed Azure compute environment.

Important terms
  • An activity defines the action to be performed in Data Factory and Synapse pipelines.

  • A linked service specifies a destination data store or compute service.

  • An integration runtime acts as a link between the activity and the linked Services.

Types of Integration Runtime

Data Factory offers three different types of Integration Runtimes from which customers can select the one that best meets their data integration and network environment needs.

The three different types include:

  • Azure
  • Self-hosted
  • Azure-SSIS

The capabilities and network support for each integration runtime type are described in the table below:

IR TypePublic NetworkPrivate Network
AzureData Flow, Data movement, Activity dispatchData Flow, Data movement, Activity dispatch
Self-hostedData movement,Activity dispatchData movement, Activity dispatch
Azure-SSISSSIS package executionSSIS package execution

Azure Integration Runtime

Microsoft is responsible for all infrastructure patching, scaling, and maintenance. The IR can only access data stores and services on public networks.

The following activities are possible with Azure Integration runtime:

  1. Execute Data Flows in Azure

  2. Execute a copy activity between cloud data stores.

  3. Dispatch the following transform activities in public network:

    • Databricks Notebook/ Jar/ Python activity
    • HDInsight Hive activity
    • HDInsight Pig activity
    • HDInsight MapReduce activity
    • HDInsight Spark activity
    • HDInsight Streaming activity
    • ML Studio (classic) Batch Execution activity
    • ML Studio (classic) Update Resource activities
    • Stored Procedure activity
    • Data Lake Analytics U-SQL activity
    • .NET custom activity
    • Web activity
    • Lookup activity
    • Get Metadata activity.

Network environment

  • Connecting to data stores and compute services with publicly accessible endpoints is supported by Azure Integration Runtime.

  • Azure Integration Runtime also supports connecting to data stores using private link service in a private network environment when Virtual network configuration is enabled.

Self-hosted Integration Runtime

Users must manage their own infrastructure and hardware for Self-hosted Integration Runtimes.

Users are responsible for all patching, scaling, and maintenance issues. The IR has access to resources in both public and private networks.

Self-hosted IR should be installed on-premises or as a virtual machine within a private network. Currently, only Windows supports self-hosted IR.

The following activities can be carried out by a self-hosted Integration Runtime:

  1. Copying data between a cloud data store and a private network data store.

  2. Dispatching the following transform activities against compute resources in on-premises or Azure Virtual Network:

    • HDInsight Hive activity (BYOC-Bring Your Own Cluster)
    • HDInsight Pig activity (BYOC)
    • HDInsight MapReduce activity (BYOC)
    • HDInsight Spark activity (BYOC)
    • HDInsight Streaming activity (BYOC)
    • ML Studio (classic) Batch Execution activity
    • ML Studio (classic) Update Resource activities
    • Stored Procedure activity
    • Data Lake Analytics U-SQL activity
    • Custom activity (runs on Azure Batch)
    • Lookup activity
    • Get Metadata activity.

Key advantage over Azure Integration Runtime

  • Consider a scenario of copying data from source to sink. When the global Azure integration runtime is associated with the linked service as the source, and an Azure integration runtime in the Azure Data Factory managed virtual network is associated with the linked service as the sink, both the source and sink linked services use the Azure integration runtime in Azure Data Factory or Synapse Workspaces using a managed virtual network.

  • When a self-hosted integration runtime associates a linked service with a source, the self-hosted integration runtime is used by both the source and sink connected services.

  • With the support of a managed virtual network, the self-hosted integration runtime takes precedence over the Azure integration runtime in Azure Data Factory or Synapse Workspaces.

Network environment

  • Users can install a self-hosted IR behind their corporate firewall or inside a virtual private network to perform data integration safely in a private network environment.

  • Only outward HTTP-based connections to the open internet are made by the self-hosted integration runtime.

Azure-SSIS Integration Runtime

  • Integration of Azure and SSIS Runtimes are virtual machines that run the SSIS engine and allow users to execute SSIS packages natively.

  • Microsoft is responsible for all infrastructure patching, scaling, and maintenance. The IR has the ability to access resources in both public and private networks.

  • Users can create an Azure-SSIS IR to natively execute SSIS packages in order to lift and shift existing SSIS workload.

Network environment

  • Azure-SSIS IR can be deployed in either the public or private networks. On-premises data access is enabled by connecting Azure-SSIS IR to a Virtual Network that is connected to your on-premises network.

Linked nodes

The Data Factory nodes associated with Integration Runtime can be viewed in the Essentials, as well as the Integration Runtime resource grid.

The following node information will be provided to the user:

  • Name
  • Status
  • Max concurrent jobs
  • Last connect time

Linked nodes.png

Linked nodes.png

The Essentials card also displays linked services and related resources such as Data Factory pipelines.

Related services.png

Resource Dashboard

A default resource dashboard is available for Data Factory Integration Runtime resources in the Overview section, allowing for enhanced data visualization and tracking of real-time data.

Resource dashboard.png

Users are provided with the following pre-defined Dashboard widgets, which can be customised to meet their specific needs.

1. CPU utilization
2. SSIS Executions - Succeeded vs Failed
3. Available Nodes

Monitoring

  1. Navigate to Data Factory Integration Runtime -> Monitoring to configure the monitoring rules for Integration runtimes
  2. Select the necessary monitoring metrics and configure the threshold values
  3. Click Save

The threshold values can also be provided with any metric name, defining the monitoring rule to be violated when the metric value configured at threshold field is met.

Monitoring.png

Monitoring rules will be saved for Data Factory Integration Runtime, and the monitoring state for the metrics will be reflected after every monitoring cycle.


Was this article helpful?

ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence