WorkflowTemplate is the Schema for the WorkflowTemplates API. A Workflow Template is a reusable workflow configuration.
Type
CRD
Group
dataproc.gcp.upbound.io
Version
v1beta1
apiVersion: dataproc.gcp.upbound.io/v1beta1
kind: WorkflowTemplate
WorkflowTemplateSpec defines the desired state of WorkflowTemplate
No description provided.
Required. The Directed Acyclic Graph of Jobs to submit.
Job is a Hadoop job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
Job is a Hive job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Job is a Pig job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
The optional list of prerequisite job step_ids. If not specified, the job will start at the beginning of workflow.
Job is a Presto job.
Presto client tags to attach to this query
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Job is a PySpark job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
Job scheduling configuration.
Job is a Spark job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
Job is a SparkR job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
The runtime log config for job execution.
Job is a SparkSql job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Template parameters whose values are substituted into the template. Values for parameters must be provided when the template is instantiated.
Required. Paths to all fields that the parameter replaces. A field is allowed to appear in at most one parameter's list of field paths. A field path is similar in syntax to a .sparkJob.args
Required. WorkflowTemplate scheduling information.
A selector that chooses target cluster for jobs based on metadata. The selector is evaluated at the time each job is submitted.
A cluster that is managed by the workflow.
Required. The cluster configuration.
Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset.
Encryption settings for the cluster.
Port/endpoint configuration for this cluster
The shared Compute Engine config settings for all instances in a cluster.
Node Group Affinity for sole-tenant clusters.
Reservation Affinity for consuming Zonal reservation.
Required. List of allowed values for the parameter.
The URIs of service account scopes to be included in Compute Engine instances. The following base set of scopes is always included: * https://www.googleapis.com/auth/cloud.useraccounts.readonly * https://www.googleapis.com/auth/devstorage.read_write * https://www.googleapis.com/auth/logging.write If no scopes are specified, the following defaults are also provided: * https://www.googleapis.com/auth/bigquery * https://www.googleapis.com/auth/bigtable.admin.table * https://www.googleapis.com/auth/bigtable.data * https://www.googleapis.com/auth/devstorage.full_control
Shielded Instance Config for clusters using Compute Engine Shielded VMs. Structure defined below.
The Compute Engine tags to add to all instances (see (https://cloud.google.com/compute/docs/label-or-tag-resources#tags)).
Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. You can test a node's role metadata to run an executable on a master or worker node, as shown below using curl (you can also use wget): ROLE=$(curl -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-role) if ; then ... master specific actions ... else ... worker specific actions ... fi
Lifecycle setting for the cluster.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
Security settings for the cluster.
Kerberos related configuration.
The config settings for software inside the cluster.
The set of components to activate on the cluster.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
THIS IS A BETA FIELD. It will be honored unless the Management Policies feature flag is disabled. InitProvider holds the same fields as ForProvider, with the exception of Identifier and other resource reference fields. The fields that are in InitProvider are merged into ForProvider when the resource is created. The same fields are also added to the terraform ignore_changes hook, to avoid updating them after creation. This is useful for fields that are required on creation, but we do not desire to update them after creation, for example because of an external controller is managing them, like an autoscaler.
Required. The Directed Acyclic Graph of Jobs to submit.
Job is a Hadoop job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
Job is a Hive job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Job is a Pig job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
The optional list of prerequisite job step_ids. If not specified, the job will start at the beginning of workflow.
Job is a Presto job.
Presto client tags to attach to this query
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Job is a PySpark job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
Job scheduling configuration.
Job is a Spark job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
Job is a SparkR job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
The runtime log config for job execution.
Job is a SparkSql job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Template parameters whose values are substituted into the template. Values for parameters must be provided when the template is instantiated.
Required. Paths to all fields that the parameter replaces. A field is allowed to appear in at most one parameter's list of field paths. A field path is similar in syntax to a .sparkJob.args
Required. WorkflowTemplate scheduling information.
A selector that chooses target cluster for jobs based on metadata. The selector is evaluated at the time each job is submitted.
A cluster that is managed by the workflow.
Required. The cluster configuration.
Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset.
Encryption settings for the cluster.
Port/endpoint configuration for this cluster
The shared Compute Engine config settings for all instances in a cluster.
Node Group Affinity for sole-tenant clusters.
Reservation Affinity for consuming Zonal reservation.
Required. List of allowed values for the parameter.
The URIs of service account scopes to be included in Compute Engine instances. The following base set of scopes is always included: * https://www.googleapis.com/auth/cloud.useraccounts.readonly * https://www.googleapis.com/auth/devstorage.read_write * https://www.googleapis.com/auth/logging.write If no scopes are specified, the following defaults are also provided: * https://www.googleapis.com/auth/bigquery * https://www.googleapis.com/auth/bigtable.admin.table * https://www.googleapis.com/auth/bigtable.data * https://www.googleapis.com/auth/devstorage.full_control
Shielded Instance Config for clusters using Compute Engine Shielded VMs. Structure defined below.
The Compute Engine tags to add to all instances (see (https://cloud.google.com/compute/docs/label-or-tag-resources#tags)).
Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. You can test a node's role metadata to run an executable on a master or worker node, as shown below using curl (you can also use wget): ROLE=$(curl -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-role) if ; then ... master specific actions ... else ... worker specific actions ... fi
Lifecycle setting for the cluster.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
Security settings for the cluster.
Kerberos related configuration.
The config settings for software inside the cluster.
The set of components to activate on the cluster.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
THIS IS A BETA FIELD. It is on by default but can be opted out through a Crossplane feature flag. ManagementPolicies specify the array of actions Crossplane is allowed to take on the managed and external resources. This field is planned to replace the DeletionPolicy field in a future release. Currently, both could be set independently and non-default values would be honored if the feature flag is enabled. If both are custom, the DeletionPolicy field will be ignored. See the design doc for more information: https://github.com/crossplane/crossplane/blob/499895a25d1a1a0ba1604944ef98ac7a1a71f197/design/design-doc-observe-only-resources.md?plain=1#L223 and this one: https://github.com/crossplane/crossplane/blob/444267e84783136daa93568b364a5f01228cacbe/design/one-pager-ignore-changes.md
ProviderConfigReference specifies how the provider that will be used to create, observe, update, and delete this managed resource should be configured.
Policies for referencing.
PublishConnectionDetailsTo specifies the connection secret config which contains a name, metadata and a reference to secret store config to which any connection details for this managed resource should be written. Connection details frequently include the endpoint, username, and password required to connect to the managed resource.
WriteConnectionSecretToReference specifies the namespace and name of a Secret to which any connection details for this managed resource should be written. Connection details frequently include the endpoint, username, and password required to connect to the managed resource. This field is planned to be replaced in a future release in favor of PublishConnectionDetailsTo. Currently, both could be set independently and connection details would be published to both without affecting each other.
WorkflowTemplateStatus defines the observed state of WorkflowTemplate.
No description provided.
Required. The Directed Acyclic Graph of Jobs to submit.
Job is a Hadoop job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
Job is a Hive job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Job is a Pig job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
The optional list of prerequisite job step_ids. If not specified, the job will start at the beginning of workflow.
Job is a Presto job.
Presto client tags to attach to this query
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Job is a PySpark job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
Job scheduling configuration.
Job is a Spark job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
Job is a SparkR job.
HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
The runtime log config for job execution.
Job is a SparkSql job.
HCFS URIs of jar files to be added to the Spark CLASSPATH.
The runtime log config for job execution.
A list of queries.
Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": } }
Template parameters whose values are substituted into the template. Values for parameters must be provided when the template is instantiated.
Required. Paths to all fields that the parameter replaces. A field is allowed to appear in at most one parameter's list of field paths. A field path is similar in syntax to a .sparkJob.args
Required. WorkflowTemplate scheduling information.
A selector that chooses target cluster for jobs based on metadata. The selector is evaluated at the time each job is submitted.
A cluster that is managed by the workflow.
Required. The cluster configuration.
Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset.
Encryption settings for the cluster.
Port/endpoint configuration for this cluster
The shared Compute Engine config settings for all instances in a cluster.
Node Group Affinity for sole-tenant clusters.
Reservation Affinity for consuming Zonal reservation.
Required. List of allowed values for the parameter.
The URIs of service account scopes to be included in Compute Engine instances. The following base set of scopes is always included: * https://www.googleapis.com/auth/cloud.useraccounts.readonly * https://www.googleapis.com/auth/devstorage.read_write * https://www.googleapis.com/auth/logging.write If no scopes are specified, the following defaults are also provided: * https://www.googleapis.com/auth/bigquery * https://www.googleapis.com/auth/bigtable.admin.table * https://www.googleapis.com/auth/bigtable.data * https://www.googleapis.com/auth/devstorage.full_control
Shielded Instance Config for clusters using Compute Engine Shielded VMs. Structure defined below.
The Compute Engine tags to add to all instances (see (https://cloud.google.com/compute/docs/label-or-tag-resources#tags)).
Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. You can test a node's role metadata to run an executable on a master or worker node, as shown below using curl (you can also use wget): ROLE=$(curl -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-role) if ; then ... master specific actions ... else ... worker specific actions ... fi
Lifecycle setting for the cluster.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
Output only. The list of instance names. Dataproc derives the names from cluster_name, num_instances, and the instance group.
Output only. The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
Output only. The list of instance names. Dataproc derives the names from cluster_name, num_instances, and the instance group.
Output only. The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.
Security settings for the cluster.
Kerberos related configuration.
The config settings for software inside the cluster.
The set of components to activate on the cluster.
The Compute Engine config settings for additional worker instances in a cluster.
The Compute Engine accelerator configuration for these instances.
Disk option config settings.
Output only. The list of instance names. Dataproc derives the names from cluster_name, num_instances, and the instance group.
Output only. The config for Compute Engine Instance Group Manager that manages this group. This is only used for preemptible instance groups.
Conditions of the resource.
template
apiVersion: dataproc.gcp.upbound.io/v1beta1
kind: WorkflowTemplate
metadata:
annotations:
meta.upbound.io/example-id: dataproc/v1beta1/workflowtemplate
labels:
testing.upbound.io/example-name: template
name: template
spec:
forProvider:
jobs:
- sparkJob:
- mainClass: SomeClass
stepId: someJob
- prerequisiteStepIds:
- someJob
prestoJob:
- queryFileUri: someuri
stepId: otherJob
location: us-central1
placement:
- managedCluster:
- clusterName: my-cluster
config:
- gceClusterConfig:
- tags:
- foo
- bar
zone: us-central1-a
masterConfig:
- diskConfig:
- bootDiskSizeGb: 15
bootDiskType: pd-ssd
machineType: n1-standard-1
numInstances: 1
secondaryWorkerConfig:
- numInstances: 2
softwareConfig:
- imageVersion: 2.0.35-debian10
workerConfig:
- diskConfig:
- bootDiskSizeGb: 10
numLocalSsds: 2
machineType: n1-standard-2
numInstances: 3
© 2022 Upbound, Inc.
Discover the building blocksfor your internal cloud platform.