Crawler is the Schema for the Crawlers API
Type
CRD
Group
glue.aws.crossplane.io
Version
v1alpha1
apiVersion: glue.aws.crossplane.io/v1alpha1
kind: Crawler
CrawlerSpec defines the desired state of Crawler
CrawlerParameters defines the desired state of Crawler
ClassifierRefs is a list of references to Classifiers used to set the Classifiers.
Policies for referencing.
ClassifiersSelector selects references to Classifiers used to set the Classifiers.
Policies for selection.
A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.
CrawlerSecurityConfigurationRef is a reference to an SecurityConfiguration used to set the CrawlerSecurityConfiguration.
Policies for referencing.
CrawlerSecurityConfigurationSelector selects references to SecurityConfiguration used to set the CrawlerSecurityConfiguration.
Policies for selection.
DatabaseNameRef is a reference to an Database used to set the DatabaseName.
Policies for referencing.
DatabaseNamesSelector selects references to Database used to set the DatabaseName.
Policies for selection.
Specifies Lake Formation configuration settings for the crawler.
Specifies data lineage configuration settings for the crawler.
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
RoleRef is a reference to an IAMRole used to set the Role.
Policies for referencing.
RoleSelector selects references to IAMRole used to set the Role.
Policies for selection.
The policy for the crawler's update and deletion behavior.
A list of collection of targets to crawl.
Targets is a required field
Specifies Glue Data Catalog targets.
DatabaseNameRef is a reference to an Database used to set the DatabaseName.
Policies for referencing.
DatabaseNamesSelector selects references to Database used to set the DatabaseName.
Policies for selection.
A list of the tables to be synchronized.
Tables is a required field
Specifies JDBC targets.
ConnectionNameRef is a reference to an Connection used to set the ConnectionName.
Policies for referencing.
ConnectionNamesSelector selects references to Connection used to set the ConnectionName.
Policies for selection.
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler (https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html).
Specifies Amazon DocumentDB or MongoDB targets.
ConnectionNameRef is a reference to an Connection used to set the ConnectionName.
Policies for referencing.
ConnectionNamesSelector selects references to Connection used to set the ConnectionName.
Policies for selection.
Specifies Amazon Simple Storage Service (Amazon S3) targets.
ConnectionNameRef is a reference to an Connection used to set the ConnectionName.
Policies for referencing.
ConnectionNamesSelector selects references to Connection used to set the ConnectionName.
Policies for selection.
DlqEventQueueARNRef is a reference to an SQSEventQueue used to set the DlqEventQueueARN.
Policies for referencing.
DlqEventQueueARNSelector selects references to SQSEventQueue used to set the DlqEventQueueARN.
Policies for selection.
EventQueueARNRef is a reference to an SQSEventQueue used to set the EventQueueARN.
Policies for referencing.
EventQueueARNSelector selects references to SQSEventQueue used to set the EventQueueARN.
Policies for selection.
A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler (https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html).
THIS IS A BETA FIELD. It is on by default but can be opted out through a Crossplane feature flag. ManagementPolicies specify the array of actions Crossplane is allowed to take on the managed and external resources. This field is planned to replace the DeletionPolicy field in a future release. Currently, both could be set independently and non-default values would be honored if the feature flag is enabled. If both are custom, the DeletionPolicy field will be ignored. See the design doc for more information: https://github.com/crossplane/crossplane/blob/499895a25d1a1a0ba1604944ef98ac7a1a71f197/design/design-doc-observe-only-resources.md?plain=1#L223 and this one: https://github.com/crossplane/crossplane/blob/444267e84783136daa93568b364a5f01228cacbe/design/one-pager-ignore-changes.md
ProviderConfigReference specifies how the provider that will be used to create, observe, update, and delete this managed resource should be configured.
Policies for referencing.
PublishConnectionDetailsTo specifies the connection secret config which contains a name, metadata and a reference to secret store config to which any connection details for this managed resource should be written. Connection details frequently include the endpoint, username, and password required to connect to the managed resource.
WriteConnectionSecretToReference specifies the namespace and name of a Secret to which any connection details for this managed resource should be written. Connection details frequently include the endpoint, username, and password required to connect to the managed resource. This field is planned to be replaced in a future release in favor of PublishConnectionDetailsTo. Currently, both could be set independently and connection details would be published to both without affecting each other.
CrawlerStatus defines the observed state of Crawler.
CrawlerObservation defines the observed state of Crawler
The status of the last crawl, and potentially error information if an error occurred.
Conditions of the resource.
glue-crawler
apiVersion: glue.aws.crossplane.io/v1alpha1
kind: Crawler
metadata:
name: glue-crawler
spec:
forProvider:
classifierRefs:
- name: glue-classifier-csv
crawlerSecurityConfigurationRef:
name: glue-securityconfiguration
databaseNameRef:
name: glue-database
region: us-east-1
roleRef:
name: glue-role
schedule: CroN(0/5 * * * ? *)
schemaChangePolicy:
deleteBehavior: LOG
updateBehavior: UPDATE_IN_DATABASE
tags:
glue: crawler
spider: speedy
targets:
dynamoDBTargets:
- path: dynamoDBpath
- path: anotherPATH
scanAll: false
scanRate: 1.1
jdbcTargets:
- connectionNameRef:
name: glue-connection-jdbc
exclusions:
- myFolder2/*
- "*.{csv,avro}"
mongoDBTargets:
- connectionNameRef:
name: glue-connection-mongodb
path: dbName/collectionName
scanAll: false
s3Targets:
- connectionNameRef:
name: glue-connection-network
dlqEventQueueArnRef:
name: test-queue2
eventQueueArnRef:
name: test-queue
exclusions:
- myFolder2/*
- "*.{csv,avro}"
path: s3://bucket/prefix/object
sampleSize: 123
providerConfigRef:
name: example
© 2022 Upbound, Inc.
Discover the building blocksfor your internal cloud platform.