Connection Types
This page serves as a reference for the connection types available in Cortex. It includes information about the following:
- The parameters available for connection definitions, including the common parameters available to all connections and the parameters specific to different connection types.
Info
Connections can be defined in the Admin Console or in CLI.
Common connection parameters
All connection definitions must include the following.
Attribute | Description |
---|---|
name | The name given to the connection |
title | Human readable connection type |
description | Description of the file contents |
group | |
type | The connection type: file, s3, mongo, or hive |
tags | The tags list contains optional values that you can enter to differentiate connections. Each tag item should include a label and value . |
connectionParams | The parameters required for the connection type selected, described in the following sections. |
each connectionParam has the following content:
name
, title
, description
, type
, required
(boolean), validation
, errorMessage
.
The specific file types/delimiter options for all connection types are: csv/sep
, csv/lineSep
, csv/encoding
, csv/comment
, csv/quote
, csv/escape
, csv/multiline
, csv/header
, jaon/style
, json/multiline
, json/lineSep
, json/encoding
S3 Connections
S3 connectionParams
When creating a connection definition for s3
, the following parameters are available.
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
publicKey | String | An AWS public key with at least read access to the S3 Bucket. (Do NOT use your AWS admin key.) | true | |
secretKey | String | An AWS secret key with at least read access to the S3 Bucket. (Do NOT use your AWS admin key.) | true | |
s3Endpoint | String | The S3 HTTP(s):// URL to use. Typically only applicable when using a server like Minio and hosting a private instance. | false | |
pathStyleAccess | Boolean | Enable/disable path style access for non-AWS s3 connections (minio/noobaa) | false | |
sslEnabled | Boolean | True if the connection uses SSL encryption when connecting to S3. | false | |
contentType | String | The type of file; validValues are CSV, JSON, Parquet | true | |
irsaEnabled | Boolean | false | Set to true when IRSA is enabled in Cortex Helm chart .yaml; used to distinguish if a connection should provide AWS API creds or inherit them via IRSA | false |
qualifiedBy | Boolean | A property of publicKey and secretKey when IRSA is enabled | false |
S3 YAML
S3 File Stream Connections
For use with Spark
S3 File Stream ConnectionParams
S3 File Stream uses the same parameters as the S3 connection (table above).
S3 File Stream has all the same properties as the S3 connection plus the following additional connectionParams
properties (Refer to the bootstap.yaml below):
Bootstrap URI: The S3 File URI to use for fetching base records that are used to infer the schema. (e.g.
uri
is set tos3a://path/to/file.txt
).Stream Read Directory: The S3 directory to stream updates from (e.g.
stream_read_dir
is set tos3a://path/to/files
).Trigger: Allows you to set up an ingestion schedule for Data Sources that pull from this connection. When
isTriggered
is set totrue
, Data Sources must be triggered (rebuilt) manually using the API method or the Fabric Console UI.When
isTriggered
is set tofalse
(default), the following parameters are also set to automatically poll the Connection on a schedule:pollInterval
: (in seconds - how often the Data Sources poll the Connection and rebuild automatically)maxFilesPerTrigger
(integer - number of files ingested each time the Connection is polled)
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
uri | String | The S3 File URI to use for fetching base records that are used to infer the schema, s3a://path/to/file.txt | false | |
stream_read_dir | String | The S3 directory to stream updates from, s3a://path/to/file.txt . | true | |
isTriggered | boolean | false | Stops standard polling of streaming data and instead ingests all available files on trigger. | false |
maxFilesPerTrigger | integer | "1" | The number of files to process for each poll interval. | false |
pollInterval | string | "300" | the period between polling in seconds | false |
S3 File Stream YAML
Google Cloud Storage Connections
For use with Spark
GCS ConnectionParams
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
uri | string | The GCS File URI to use, gs://path/to/file.txt | true | |
serviceAccountKey | string | Google Service Account Json credentials to authenticate against GCS (include: secure:true ) | false | |
storageRoot | string | The GCS HTTP(s):// URL to use, https://storage.googleapis.com/ . Typically only applicable when using a server like Minio and hosting a private instance | false | |
servicePath | string | `storage/v1/ | The GCS Service Path to use | false |
GCS YAML
GCS File Stream Connections
GCS File Stream ConnectionParams
GCS File Stream uses the same parameters as the GCS connection (above).
GCS File Stream has all the same properties as the GCS connection plus the following additional connectionParams
properties (Refer to the YAML below):
Bootstrap URI: The GCS File URI to use for fetching base records that are used to infer the schema. (e.g.
uri
is set togs://path/to/file.txt
).Stream Read Directory: The GCS directory to stream updates from (e.g.
stream_read_dir
is set togs://path/to/file.txt
).Trigger: Allows you to set up an ingestion schedule for Data Sources that pull from this connection. When
isTriggered
is set totrue
, Data Sources must be triggered (rebuilt) manually using the API method or the Fabric Console UI.When
isTriggered
is set tofalse
(default), the following parameters are also set to automatically poll the Connection on a schedule:pollInterval
: (in seconds - how often the Data Sources poll the Connection and rebuild automatically)maxFilesPerTrigger
(integer - number of files ingested each time the Connection is polled)
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
uri | String | The GCS File URI to use for fetching base records that are used to infer the schema, gs://path/to/file.txt | false | |
stream_read_dir | String | The GCS directory to stream updates from, gs://path/to/file.txt . | true | |
isTriggered | boolean | false | Stops standard polling of streaming data and instead ingests all available files on trigger. | false |
maxFilesPerTrigger | integer | "1" | The number of files to process for each poll interval. | false |
pollInterval | string | "300" | the period between polling in seconds | false |
GCS File Stream YAML
Local Files Connections
Local Files connectionParms
When creating a connection definition for file
, the following parameters are available.
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
uri | String | The file URI file://path/to/file.txt | false | |
contentType | String | The type of file; validValues are CSV, JSON, Parquet | true |
Local Files yaml
Mongo connections
Mongo connectionParms
When creating a connection definition for mongo
, the following parameters are available.
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
username | String | The username for authenticating to the database | false | |
password | String | The secret ref containing the password for authenticating to the database. | false | |
uri | String | mongodb://{host:port}/{database} | The URI string including: database name, username, and password. NOTE: To set a secret variable set the parameter secure: true . See https://docs.mongodb.com/manual/reference/connection-string/ for more details. | true |
collection | String | Enter the name of the collection to query in the Mongo database. | false | |
database | String | Enter the name of the Mongo database to connect to. | false | |
sslEnabled | Boolean | false | True if the connection uses SSL encryption when connecting to the database. (Recommended) | false |
Mongo YAML
Hive Connections
Hive connectionParms
When creating a connection definition for hive
, the following parameters are available.
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
autoCreateAll | Boolean | "true" | Optional flag that can reduce errors with an empty metastore database as of Hive 2.1. | false |
schemaVerification | Boolean | "false" | Optional flag that can reduce errors with an empty metastore database as of Hive 2.1. | false |
metastoreUri | String | The thrift URL of the Hive Metastore Server. | true | |
connectionUrl | String | "jdbc:hive2://{host:port}/{database}" | The JDBC compliant Hive URI used to connect to the database. URI format should conform to this pattern: jdbc:hive2://<host1>:<port1>,<host2>:<port2>/dbName;initFile=<file>;sess_var_list?hive_conf_list#hive_var_list . See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLFormat for more details.https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hive-metastore.html | true |
connectionUserName | String | The username for authenticating to the database. | false | |
connectionPassword | String | The password for authenticating as an authorized user. NOTE: To set a secret variable set the parameter secure: true . | false | |
metastoreVersion | String | Version of the Hive Metastore to connect to | true | |
metastoreJars | String | Jars to use when connecting to Hive Metastore, dependent on version of Hive https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-options | false | |
warehouseDir | String | "spark-warehouse" | The location to use for the spark warehouse dir. | false |
Hive YAML
JDBC Generic Connections
JDBC-generic ConnectionParams
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
uri | String | jdbc:{protocol}://{host:port}/{database} | A fully qualified JDBC URI containing the dialect, host, port, database and other options. | true |
username | String | The username that is used to gain access to the database | false | |
password | String | The password that is used for authenticating as an authorized user | false | |
classname | String | The classname of the JDBC driver to be loaded into the cortex runtime | true |
JDBC-generic.yaml
JDBC CData Connections
JDBC CData Connections are built into a Skill template in the cortex-fabric-examples GitHub repo.
CData is a third party provider who abstracts commonly available databases to use JDBC connections (e.g. Salesforce, Twitter). When you select a CData connection type in Fabric Console, the parameters available for that connection type are selectable. The links in this table will take you to documentation provided by CData, so you can better understand how to configure these parameters.
Prerequisites for configuring CData connections are found here.
Instructions for working with CData JDBC connectors are available on the CData website.
Upload the Driver to Managed Content and make note of the URI.
Go to the CData help website to view the online documentation for your driver.
JDBC-cdata ConnectionParams
Parameter | Type | Default | Description | Required |
---|---|---|---|---|
plugin_properties | String | (secure) The key for the JSON-formatted configuration file stored in Managed Content and passed to the plugin at startup | false | |
classname | string | You can find this in the online documentation for your specific driver (under "Getting Started") on the CData help website | true |