Version: 6.3.3

Connection Types

This page serves as a reference for the connection types available in Cortex. It includes information about the following:

  • The parameters available for connection definitions, including the common parameters available to all connections and the parameters specific to different connection types.

Common connection parameters

All connection definitions must include the following.

AttributeDescription
nameThe name given to the connection
titleHuman readable connection type
descriptionDescription of the file contents
group
typeThe connection type: file, s3, mongo, or hive
tagsThe tags list contains optional values that you can enter to differentiate connections. Each tag item should include a label and value.
connectionParamsThe parameters required for the connection type selected, described in the following sections.

each connectionParam has the following content: name, title, description, type, required (boolean), validation, errorMessage.

The specific file types/delimiter options for all connection types are: csv/sep, csv/lineSep, csv/encoding, csv/comment, csv/quote, csv/escape, csv/multiline, csv/header, jaon/style, json/multiline, json/lineSep, json/encoding

S3 Connections

S3 connectionParams

When creating a connection definition for s3, the following parameters are available.

ParameterTypeDefaultDescriptionRequired
publicKeyStringAn AWS public key with at least read access to the S3 Bucket. (Do NOT use your AWS admin key.)true
secretKeyStringAn AWS secret key with at least read access to the S3 Bucket. (Do NOT use your AWS admin key.)true
s3EndpointStringThe S3 HTTP(s):// URL to use. Typically only applicable when using a server like Minio and hosting a private instance.false
pathStyleAccessBooleanEnable/disable path style access for non-AWS s3 connections (minio/noobaa)false
sslEnabledBooleanTrue if the connection uses SSL encryption when connecting to S3.false
contentTypeStringThe type of file; validValues are CSV, JSON, Parquettrue
irsaEnabledBooleanfalseSet to true when IRSA is enabled in Cortex Helm chart .yaml; used to distinguish if a connection should provide AWS API creds or inherit them via IRSAfalse
qualifiedByBooleanA property of publicKey and secretKey when IRSA is enabledfalse

S3 YAML

- name: s3
title: S3 Connection
description: File storage with S3.
group: cortex
type: s3
tags:
- label: category.connection.type
value: Files
- label: category.connection.type
value: Cloud Storage
connectionParams:
- name: irsaEnabled
title: Use IAM Role Service Account (IRSA) Authentication
description: >
Allows for credentials to be inherited through IRSA
type: Boolean
default: false
required: false
validation: "/^true|false$/g"
- name: publicKey
title: Public Access Key
description: >
An AWS public key with at least read access to the S3 Bucket. (Do NOT use your AWS admin key.)
type: String
required: true
validation: "/^.+$/g"
errorMessage: Invalid public access key.
qualifiedBy: irsaEnabled=false
- name: secretKey
title: Secret Access Key
description: >
An AWS secret key with at least read access to the S3 Bucket. (Do NOT use your AWS admin key.)
type: String
secure: true
required: true
validation: "/^.+$/g"
errorMessage: Invalid secret access key.
qualifiedBy: irsaEnabled=false
- name: s3Endpoint
title: S3 API Endpoint
description: >
The S3 HTTP(s):// URL to use. Typically only applicable when using a server like Minio and hosting a private instance.
type: String
required: false
validation: "/^http(s)?:\\/\\/.+$/g"
errorMessage: Invalid URL.
- name: pathStyleAccess
title: Path Style Access (Non-AWS)
description: >
Enable/disable path style access for non-AWS s3 connections (minio/noobaa).
type: Boolean
required: false
validation: "/^true|false$/g"
errorMessage: Must be true or false.
- name: sslEnabled
description: >
True if the connection uses SSL encryption when connecting to S3.
title: SSL Enabled
type: Boolean
required: false
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
- name: contentType
type: String
title: Content Type
description: Description of the file type.
required: true
validValues:
- CSV
- JSON
- Parquet
- name: csv/sep
type: String
title: Separator
description: Character used to delimit fields in the record.
defaultValue: ","
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid separator; must be a single character.
- name: csv/lineSep
type: String
title: Line Separator
description: The line separator that should be used for parsing. Maximum length is 1 character.
defaultValue: "\n"
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid line separator; must be a single character.
- name: csv/encoding
title: Encoding
description: decodes the CSV files by the given encoding type.
type: String
qualifiedBy: contentType
defaultValue: "UTF-8"
required: false
validation: "/^.+$/g"
errorMessage: Incorrect encoding type.
- name: csv/comment
type: String
description: sets a single character used for skipping lines beginning with this character.
defaultValue: "\""
title: Comment Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid comment character; must be a single character.
- name: csv/quote
type: String
description: Character used to denote quotation marks (single or double quotes).
defaultValue: "\""
title: Quote Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid quote character; must be a single character.
- name: csv/escape
type: String
description: Character used to escape values that contain delimiters.
defaultValue: "\""
title: Escape Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid escape character; must be a single character.
- name: csv/multiline
type: Boolean
title: Multiline
description: Parse one record, which may span multiple lines.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: csv/header
type: Boolean
title: First Line is Header Row
description: True if the first line of the file contains a header row with column names.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: json/style
type: String
title: JSON Style
description: Format style of the JSON file (lines, array, or object).
defaultValue: "lines"
validValues:
- lines
- array
- object
required: true
qualifiedBy: contentType
errorMessage: Must be lines, array, or object.
- name: json/multiline
type: Boolean
title: Multiline
description: Parse one record, which may span multiple lines.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: json/lineSep
type: String
title: Line Separator
description: The line separator that should be used for parsing. Maximum length is 1 character.
defaultValue: "\n"
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid delimiter; must be a single character.
- name: json/encoding
title: Encoding
description: >
allows to forcibly set one of standard basic or extended encoding for the JSON files.
For example UTF-16BE, UTF-32LE.
If the encoding is not specified and multiLine is set to true, it will be detected automatically.
type: String
qualifiedBy: contentType
defaultValue: "UTF-8"
required:false
validation: "/^.+$/g"
errorMessage: Incorrect encoding type.

S3 File Stream Connections

For use with Spark

S3 File Stream ConnectionParams

S3 File Stream uses the same parameters as the S3 connection (table above).

S3 File Stream has all the same properties as the S3 connection plus the following additional connectionParams properties (Refer to the bootstap.yaml below):

  • Bootstrap URI: The S3 File URI to use for fetching base records that are used to infer the schema. (e.g. uri is set to s3a://path/to/file.txt).

  • Stream Read Directory: The S3 directory to stream updates from (e.g. stream_read_dir is set to s3a://path/to/files).

  • Trigger: Allows you to set up an ingestion schedule for Data Sources that pull from this connection. When isTriggered is set to true, Data Sources must be triggered (rebuilt) manually using the API method or the Fabric Console UI.

    When isTriggered is set to false (default), the following parameters are also set to automatically poll the Connection on a schedule:

    • pollInterval: (in seconds - how often the Data Sources poll the Connection and rebuild automatically)
    • maxFilesPerTrigger (integer - number of files ingested each time the Connection is polled)
ParameterTypeDefaultDescriptionRequired
uriStringThe S3 File URI to use for fetching base records that are used to infer the schema, s3a://path/to/file.txtfalse
stream_read_dirStringThe S3 directory to stream updates from, s3a://path/to/file.txt.true
isTriggeredbooleanfalseStops standard polling of streaming data and instead ingests all available files on trigger.false
maxFilesPerTriggerinteger"1"The number of files to process for each poll interval.false
pollIntervalstring"300"the period between polling in secondsfalse

S3 File Stream YAML

...
- name: s3FileStream
title: S3 File Stream
description: Stream files with S3.
group: cortex
type: s3FileStream
tags:
- label: category.connection.type
value: Files
- label: category.connection.type
value: Cloud Storage
- label: category.connection.type
value: Streaming
connectionParams:
- name: uri
title: Bootstrap URI
description: >
The S3 File URI to use for base records and schema inference, `s3a://path/to/file.txt`.
type: String
required: false
validation: "/^.+$/g"
errorMessage: Invalid S3 File URI.
- name: stream_read_dir
title: Stream Read Directory
description: >
The S3 directory to stream updates from, `s3a://path/to/files`.
type: String
required: true
validation: "/^.+$/g"
errorMessage: Invalid S3 File URI...
- name: isTriggered
title: Trigger manually using Data Source ingest
description: >
Stops standard polling of streaming data and instead ingests all available files on trigger.
type: Boolean
default: false
required: false
validation: "/^true|false$/g"
- name: maxFilesPerTrigger
type: String
title: Max Files per Poll Interval
description: The number of files to process for each poll interval.
defaultValue: "1"
required: false
qualifiedBy: isTriggered=false
validation: "/^[1-9]\\d{0,7}$/g"
errorMessage: Invalid number of files; must be an integer of 8 digits or less.
- name: pollInterval
type: String
title: Poll Interval
description: The poll interval in seconds.
defaultValue: "300"
required: false
qualifiedBy: isTriggered=false
validation : "/^[1-9]\\d{0,7}$/g"
errorMessage: Invalid poll interval; must be an integer of 8 digits or less.

Google Cloud Storage Connections

For use with Spark

GCS ConnectionParams

ParameterTypeDefaultDescriptionRequired
uristringThe GCS File URI to use, gs://path/to/file.txttrue
serviceAccountKeystringGoogle Service Account Json credentials to authenticate against GCS (include: secure:true)false
storageRootstringThe GCS HTTP(s):// URL to use, https://storage.googleapis.com/. Typically only applicable when using a server like Minio and hosting a private instancefalse
servicePathstring`storage/v1/The GCS Service Path to usefalse

GCS YAML

- name: gcs
title: GCS Connection
description: File storage with GCS.
group: cortex
type: gcs
tags:
- label: category.connection.type
value: Files
- label: category.connection.type
value: Cloud Storage
connectionParams:
- name: uri
title: File URI
description: >
The GCS File URI to use, `gs://path/to/file.txt`.
type: String
required: true
validation: "/^.+$/g"
errorMessage: Invalid GCS File URI.
- name: serviceAccountKey
title: Service Account Key Json Secret
description: >
Google Service Account Json credentials to authenticate against GCS.
type: String
secure: true
required: false
validation: "/^#SECURE\\..+$/g"
errorMessage: Invalid Google Service Account Key
- name: storageRoot
title: GCS API Root
description: >
The GCS HTTP(s):// URL to use, `https://storage.googleapis.com/`. Typically only applicable when using a server like Minio and hosting a private instance.
type: String
required: false
validation: "/^http(s)?:\\/\\/.+$/g"
errorMessage: Invalid URL.
- name: servicePath
title: GCS Service Path
description: >
The GCS Service Path to use, `storage/v1/`.
type: String
required: false
validation: "/^.+$/g"
errorMessage: Invalid GCS Service Path.
- name: csv/sep
type: String
title: Separator
description: Character used to delimit fields in the record.
defaultValue: ","
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid separator; must be a single character.
- name: csv/lineSep
type: String
title: Line Separator
description: The line separator that should be used for parsing. Maximum length is 1 character.
defaultValue: "\n"
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid line separator; must be a single character.
- name: csv/encoding
title: Encoding
description: decodes the CSV files by the given encoding type.
type: String
qualifiedBy: contentType
defaultValue: "UTF-8"
required: false
validation: "/^.+$/g"
errorMessage: Incorrect encoding type.
- name: csv/comment
type: String
description: sets a single character used for skipping lines beginning with this character.
defaultValue: '"'
title: Comment Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid comment character; must be a single character.
- name: csv/quote
type: String
description: Character used to denote quotation marks (single or double quotes).
defaultValue: '"'
title: Quote Character
required: false
qualifiedBy: contentType
validation: '/^(''|")$/'
errorMessage: Invalid quote character; must be a single character.
- name: csv/escape
type: String
description: Character used to escape values that contain delimiters.
defaultValue: '"'
title: Escape Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid escape character; must be a single character.
- name: csv/multiline
type: Boolean
title: Multiline of Line Separator
description: Parse one record, which may span multiple lines.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: csv/header
type: Boolean
title: First Line is Header Row
description: True if the first line of the file contains a header row with column names.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: csv/multiline=true
- name: json/multiline
type: Boolean
title: Multiline of JSON Style
description: Parse one record, which may span multiple lines.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: json/style
type: String
title: JSON Style
description: Format style of the JSON file (lines, array, or object).
defaultValue: "lines"
validValues:
- lines
- array
- object
required: false
qualifiedBy: json/multiline=true
errorMessage: Must be lines, array, or object.
- name: json/lineSep
type: String
title: Line Separator
description: The line separator that should be used for parsing. Maximum length is 1 character.
defaultValue: "\n"
required: false
qualifiedBy: json/multiline=true
validation: "/^.$/g"
errorMessage: Invalid delimiter; must be a single character.
- name: json/encoding
title: Encoding
description: >
allows to forcibly set one of standard basic or extended encoding for the JSON files.
For example UTF-16BE, UTF-32LE.
If the encoding is not specified and multiLine is set to true, it will be detected automatically.
type: String
qualifiedBy: json/multiline=true
defaultValue: "UTF-8"
required: false
validation: "/^.+$/g"
errorMessage: Incorrect encoding type.

GCS File Stream Connections

GCS File Stream ConnectionParams

GCS File Stream uses the same parameters as the GCS connection (above).

GCS File Stream has all the same properties as the GCS connection plus the following additional connectionParams properties (Refer to the YAML below):

  • Bootstrap URI: The GCS File URI to use for fetching base records that are used to infer the schema. (e.g. uri is set to gs://path/to/file.txt).

  • Stream Read Directory: The GCS directory to stream updates from (e.g. stream_read_dir is set to gs://path/to/file.txt).

  • Trigger: Allows you to set up an ingestion schedule for Data Sources that pull from this connection. When isTriggered is set to true, Data Sources must be triggered (rebuilt) manually using the API method or the Fabric Console UI.

    When isTriggered is set to false (default), the following parameters are also set to automatically poll the Connection on a schedule:

    • pollInterval: (in seconds - how often the Data Sources poll the Connection and rebuild automatically)
    • maxFilesPerTrigger (integer - number of files ingested each time the Connection is polled)
ParameterTypeDefaultDescriptionRequired
uriStringThe GCS File URI to use for fetching base records that are used to infer the schema, gs://path/to/file.txtfalse
stream_read_dirStringThe GCS directory to stream updates from, gs://path/to/file.txt.true
isTriggeredbooleanfalseStops standard polling of streaming data and instead ingests all available files on trigger.false
maxFilesPerTriggerinteger"1"The number of files to process for each poll interval.false
pollIntervalstring"300"the period between polling in secondsfalse

GCS File Stream YAML

- name: gcsFileStream
title: GCS Filestream
description: Stream files in Google Cloud Storage.
group: cortex
type: gcsFileStream
tags:
- label: category.connection.type
value: Files
- label: category.connection.type
value: Cloud Storage
- label: category.connection.type
value: Streaming
connectionParams:
- name: uri
title: Bootstrap URI
description: >
The GCS File URI to use for base records and schema inference, `gs://path/to/file.txt`.
type: String
required: true
validation: "/^.+$/g"
errorMessage: Invalid GCS File URI.
- name: stream_read_dir
title: Stream Read Directory
description: >
The GCS directory to stream updates from, `gs://path/to/file.txt`.
type: String
required: true
validation: "/^.+$/g"
errorMessage: Invalid GCS File URI.
- name: isTriggered
title: Trigger manually using Data Source ingest
description: >
Stops standard polling of streaming data and instead ingests all available files on trigger.
type: Boolean
default: false
required: false
validation: "/^true|false$/g"
- name: maxFilesPerTrigger
type: String
title: Max Files per Poll Interval
description: The number of files to process for each poll interval.
defaultValue: "1"
required: false
qualifiedBy: isTriggered=false
validation: "/^[1-9]\\d{0,7}$/g"
errorMessage: Invalid number of files; must be an integer of 8 digits or less.
- name: pollInterval
type: String
title: Poll Interval
description: The poll interval in seconds.
defaultValue: "300"
required: false
qualifiedBy: isTriggered=false
validation : "/^[1-9]\\d{0,7}$/g"
errorMessage: Invalid poll interval; must be an integer of 8 digits or less.

Local Files Connections

Local Files connectionParms

When creating a connection definition for file, the following parameters are available.

ParameterTypeDefaultDescriptionRequired
uriStringThe file URI file://path/to/file.txtfalse
contentTypeStringThe type of file; validValues are CSV, JSON, Parquettrue

Local Files yaml

connectionTypes:
- name: file
title: Local File
description: Local File storage.
group: cortex
type: file
tags:
- label: category.connection.type
value: Files
- label: category.connection.type
value: Cloud Storage
connectionParams:
- name: uri
title: File URI
description: >
The File URI to use, `file://path/to/file.txt`.
type: String
required: false
validation: "/^.+$/g"
errorMessage: Invalid File URI.
- name: contentType
type: String
title: Content Type
description: Description of the file type.
required: true
validValues:
- CSV
- JSON
- Parquet
- name: csv/sep
type: String
title: Separator
description: Character used to delimit fields in the record.
defaultValue: ","
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid separator; must be a single character.
- name: csv/lineSep
type: String
title: Line Separator
description: The line separator that should be used for parsing. Maximum length is 1 character.
defaultValue: "\n"
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid line separator; must be a single character.
- name: csv/encoding
title: Encoding
description: decodes the CSV files by the given encoding type.
type: String
qualifiedBy: contentType
defaultValue: "UTF-8"
required: false
validation: "/^.+$/g"
errorMessage: Incorrect encoding type.
- name: csv/comment
type: String
description: sets a single character used for skipping lines beginning with this character.
defaultValue: "\""
title: Comment Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid comment character; must be a single character.
- name: csv/quote
type: String
description: Character used to denote quotation marks (single or double quotes).
defaultValue: "\""
title: Quote Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid quote character; must be a single character.
- name: csv/escape
type: String
description: Character used to escape values that contain delimiters.
defaultValue: "\""
title: Escape Character
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid escape character; must be a single character.
- name: csv/multiline
type: Boolean
title: Multiline
description: Parse one record, which may span multiple lines.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: csv/header
type: Boolean
title: First Line is Header Row
description: True if the first line of the file contains a header row with column names.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: json/style
type: String
title: JSON Style
description: Format style of the JSON file (lines, array, or object).
defaultValue: "lines"
validValues:
- lines
- array
- object
required: true
qualifiedBy: contentType
errorMessage: Must be lines, array, or object.
- name: json/multiline
type: Boolean
title: Multiline
description: Parse one record, which may span multiple lines.
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
required: false
qualifiedBy: contentType
- name: json/lineSep
type: String
title: Line Separator
description: The line separator that should be used for parsing. Maximum length is 1 character.
defaultValue: "\n"
required: false
qualifiedBy: contentType
validation: "/^.$/g"
errorMessage: Invalid delimiter; must be a single character.
- name: json/encoding
title: Encoding
description: >
allows to forcibly set one of standard basic or extended encoding for the JSON files.
For example UTF-16BE, UTF-32LE.
If the encoding is not specified and multiLine is set to true, it will be detected automatically.
type: String
qualifiedBy: contentType
defaultValue: "UTF-8"
required: false
validation: "/^.+$/g"
errorMessage: Incorrect encoding type.

Mongo connections

Mongo connectionParms

When creating a connection definition for mongo, the following parameters are available.

ParameterTypeDefaultDescriptionRequired
usernameStringThe username for authenticating to the databasefalse
passwordStringThe secret ref containing the password for authenticating to the database.false
uriStringmongodb://{host:port}/{database}The URI string including: database name, username, and password. NOTE: To set a secret variable set the parameter secure: true. See https://docs.mongodb.com/manual/reference/connection-string/ for more details.true
collectionStringEnter the name of the collection to query in the Mongo database.false
databaseStringEnter the name of the Mongo database to connect to.false
sslEnabledBooleanfalseTrue if the connection uses SSL encryption when connecting to the database. (Recommended)false

Mongo YAML

- name: mongo
title: MongoDB
description: |
Query documents stored in MongoDB.
Below is an example connection.
https://docs.mongodb.com/spark-connector/master/
"```
name: default/exampleMongoConnection
title: Example Mongo Connection
description: Example Mongo Connection
connectionType: mongo
allowWrite: true
params:
- name: mongoUri
value: mongodb://mongodb:27017/auto_test
```"
group: cortex
type: mongo
tags:
- label: category.connection.type
value: NoSQL
- label: category.connection.type
value: Document Store
connectionParams:
- name: username
title: Username
description: The username for authenticating to the database.
type: String
required: false
validation: "/^.+$/g"
errorMessage: Incorrect username format.
- name: password
title: Password
description: The secret ref containing the password for authenticating to the database.
type: String
secure: true
required: false
validation: "/^.+$/g"
errorMessage: Incorrect password format.
- name: uri
title: Mongo URI
description: >
The URI of the Mongo instance to access.
The connection string of the form mongodb://host:port/ where host can be a hostname, IP address, or UNIX domain socket. If :port is unspecified, the connection uses the default MongoDB port 27017.
All options can be specified directly in the URI and if both are provided the option in the URI will take precedence.
See https://docs.mongodb.com/spark-connector/master/configuration#input-configuration for more details.
type: String
required: true
secure: false
defaultValue: "mongodb://{host:port}/"
validation: "/^(mongodb.*?):(?:.+)$/g"
errorMessage: Invalid Mongo URI.
- name: collection
description: Enter the name of the collection to query in the Mongo database.
title: Mongo Collection
type: String
required: false
validation: "/^\\w+$/g"
errorMessage: Invalid collection name.
- name: database
description: Enter the name of the Mongo database to connect to.
title: Database Name
type: String
required: false
validation: "/^\\w+$/g"
errorMessage: Invalid collection name.
- name: sslEnabled
description: >
True if the connection uses SSL encryption when connecting to the database. (Recommended)
title: SSL Enabled
type: Boolean
required: false
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
# options not added batchSize, localThreshold, readPreference.name, and readPreference.tagSets

Hive Connections

Hive connectionParms

When creating a connection definition for hive, the following parameters are available.

ParameterTypeDefaultDescriptionRequired
autoCreateAllBoolean"true"Optional flag that can reduce errors with an empty metastore database as of Hive 2.1.false
schemaVerificationBoolean"false"Optional flag that can reduce errors with an empty metastore database as of Hive 2.1.false
metastoreUriStringThe thrift URL of the Hive Metastore Server.true
connectionUrlString"jdbc:hive2://{host:port}/{database}"The JDBC compliant Hive URI used to connect to the database. URI format should conform to this pattern: jdbc:hive2://<host1>:<port1>,<host2>:<port2>/dbName;initFile=<file>;sess_var_list?hive_conf_list#hive_var_list. See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLFormat for more details.https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hive-metastore.htmltrue
connectionUserNameStringThe username for authenticating to the database.false
connectionPasswordStringThe password for authenticating as an authorized user. NOTE: To set a secret variable set the parameter secure: true.false
metastoreVersionStringVersion of the Hive Metastore to connect totrue
metastoreJarsStringJars to use when connecting to Hive Metastore, dependent on version of Hive https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-optionsfalse
warehouseDirString"spark-warehouse"The location to use for the spark warehouse dir.false

Hive YAML

- name: hive
title: Hive
description: >
Query data in Hive.
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_configure_a_spark_hive_connection.html
group: cortex
type: hive
tags:
- label: category.connection.type
value: SQL
connectionParams:
# - name: datanucleus.schema.autoCreateAll
- name: autoCreateAll
title: AutoCreate Schema
description: Optional flag that can reduce errors with an emtpy metastore database as of Hive 2.1.
type: Boolean
required: false
validation: "/^true|false$/g"
defaultValue: "true"
errorMessage: Must be true or false.
# - name: hive.metastore.schema.verification
- name: schemaVerification
title: Spark Metastore Schema Verification
description: Optional flag that can reduce errors with an emtpy metastore database as of Hive 2.1.
type: Boolean
required: false
validation: "/^true|false$/g"
defaultValue: "false"
errorMessage: Must be true or false.
# - name: spark.hadoop.hive.metastore.uris
- name: metastoreUri
title: Spark Metastore Uris
description: The thrift URL of the Hive Metastore Server.
type: String
required: true
validation: "/^.+$/g"
errorMessage: Incorrect Metastore URL.
# - name: spark.hadoop.javax.jdo.option.ConnectionURL
- name: connectionURL
title: URI
description: >
The JDBC compliant Hive URI used to connect to the database. URI format should conform to this pattern: `jdbc:hive2://<host1>:<port1>,<host2>:<port2>/dbName;initFile=<file>;sess_var_list?hive_conf_list#hive_var_list`.
See https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLFormat for more details.
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-hive-metastore.html
type: String
required: true
defaultValue: "jdbc:hive2://{host:port}/{database}"
validation: "/^jdbc:hive2:(?:.+)$/g"
errorMessage: Incorrect JDBC URI format.
# - name: spark.hadoop.javax.jdo.option.ConnectionUserName
- name: connectionUserName
title: Username
description: The username for authenticating to the database.
type: String
required: false
validation: "/^.+$/g"
errorMessage: Incorrect username format.
# - name: spark.hadoop.javax.jdo.option.ConnectionPassword
- name: connectionPassword
title: Password
description: The password for authenticating as an authorized user.
type: String
secure: true
required: false
validation: "/^.+$/g"
errorMessage: Incorrect password format.
# - name: spark.sql.hive.metastore.version
- name: metastoreVersion
title: Hive Metastore Version
description: Version of the Hive Metastore to connect to
type: String
required: true
validation: "/^.+$/g"
errorMessage: Incorrect metastore version.
# - name: spark.sql.hive.metastore.jars
- name: metastoreJars
title: Metastore Jars
description: >
Jars to use when connecting to Hive Metastore, dependent on version of Hive
https://docs.databricks.com/data/metastores/external-hive-metastore.html#spark-configuration-options
type: String
required: false
validation: "/^.+$/g"
errorMessage: Incorrect Metastore Jars.
# - name: spark.sql.warehouse.dir
- name: warehouseDir
title: Spark Warehouse Dir
description: The location to use for the spark warehouse dir.
type: String
required: false
defaultValue: "spark-warehouse"
validation: "/^.+$/g"
errorMessage: Incorrect Spark Warehouse Dir.
# options not added spark.hadoop.fs.s3a.credentialsType and spark.hadoop.fs.s3a.stsAssumeRole.arn

JDBC Generic Connections

JDBC-generic ConnectionParams

ParameterTypeDefaultDescriptionRequired
uriStringjdbc:{protocol}://{host:port}/{database}A fully qualified JDBC URI containing the dialect, host, port, database and other options.true
usernameStringThe username that is used to gain access to the databasefalse
passwordStringThe password that is used for authenticating as an authorized userfalse
classnameStringThe classname of the JDBC driver to be loaded into the cortex runtimetrue

JDBC-generic.yaml

- name: jdbc_generic
description: Query data using a JDBC Connection
title: JDBC Generic
group: cortex
type: jdbc
connectionQueryParams:
- name: query
title: SQL Query
description: An example SQL query to run in order to test connectivity
type: String
required: true
connectionParams:
- name: uri
title: URI
description: A fully qualified JDBC URI containing the dialect, host, port, database and other options.
type: String
required: true
defaultValue: jdbc:{protocol}://{host:port}/{database}
validation: "/^jdbc:(?:.+)$/g"
errorMessage: Incorrect JDBC URI format.
- name: username
title: Username
description: The username that is used to gain access to the database.
type: String
required: false
validation: "/^.+$/g"
errorMessage: Incorrect username format.
- name: password
title: Password
description: The password that is used for authenticating as an authorized user.
type: String
secure: true
required: false
validation: "/^.+$/g"
errorMessage: Incorrect password format.
- name: classname
title: Driver Class Name
description: The classname of the JDBC driver to be loaded into the cortex runtime.
type: String
required: true
validation: "/^([a-zA-Z_$][\\w$]*\\.)*[a-zA-Z_$][\\w$]*$/g"
errorMessage: Incorrect Java class name format.
tags:
- label: category.connection.type
value: SQL

JDBC CData Connections

JDBC CData Connections are built into a Skill template in the cortex-fabric-examples GitHub repo.

CData is a third party provider who abstracts commonly available databases to use JDBC connections (e.g. Salesforce, Twitter). When you select a CData connection type in Fabric Console, the parameters available for that connection type are selectable. The links in this table will take you to documentation provided by CData, so you can better understand how to configure these parameters.

Prerequisites for configuring CData connections are found here.

Instructions for working with CData JDBC connectors are available on the CData website.

  1. Download driver jar file

  2. Upload the Driver to Managed Content and make note of the URI.

  3. Go to the CData help website to view the online documentation for your driver.

JDBC-cdata ConnectionParams

ParameterTypeDefaultDescriptionRequired
plugin_propertiesString(secure) The key for the JSON-formatted configuration file stored in Managed Content and passed to the plugin at startupfalse
classnamestringYou can find this in the online documentation for your specific driver (under "Getting Started") on the CData help websitetrue

JDBC-cdata.yaml

- name: jdbc_cdata
description: Query data using a CDATA JDBC Connection
title: JDBC CDATA
group: cortex
type: jdbc
connectionQueryParams:
- name: query
title: SQL Query
description: An example SQL query to run in order to test connectivity
type: String
required: true
connectionParams:
- name: plugin_properties
description: The JSON-formatted configuration data provided in this field is passed
to the plugin at startup
title: Plugin Properties
type: String
required: false
secure: true
validation: "/^.+$/g"
errorMessage: Must be JSON formatted string.
- name: classname
title: Driver Class Name
description: The classname of the CDATA JDBC driver to be loaded into the cortex runtime.
type: String
required: true
validation: "/^cdata\\.jdbc\\.[A-z0-9\\.]*$/g"
errorMessage: Incorrect CDATA driver Java class name format.
tags:
- label: category.connection.type
value: SQL
- label: CDATA
value: CDATA
connections:
- name: content
title: Cortex Managed Content
description: Built in storage for files managed by the platform.
connectionType: managedContent
allowWrite: true