![]() Like: on ubuntu: sudo adduser -ingroup sudo adduser -ingroup hadoop,hdfs demouser1 on redhat: useradd -g passwd In order to connect to dataproc cluster, please a functional user. Replace the project “example-210412” with correct one. You’ll get similar path ie “projects/example-210412/locations/us-west1/keyRings/Thales-KeyRing-UsWest1/cryptoKeys/ekm-key1”. In this case, we used KMS key provided by “Thales”. External key “ekm-key1” was also created. GCS bucket “dataprocgke” was created earlier too. ![]() Note that auto scaling policy “dataproc-rapidminer-as-policy” was created prior to executing above command. Example: gcloud dataproc clusters create example-dataproc-spark248 -autoscaling-policy dataproc-rapidminer-as-policy -enable-component-gateway -bucket dataprocgke -region us-west1 -subnet us-west1 -zone us-west1-b -master-machine-type n1-standard-4 -master-boot-disk-size 50 -num-workers 4 -worker-machine-type n1-standard-4 -worker-boot-disk-size 50 -image-version 1.4-debian10 -optional-components ZEPPELIN,ZOOKEEPER -gce-pd-kms-key projects/example-210412/locations/us-west1/keyRings/Thales-KeyRing-UsWest1/cryptoKeys/ekm-key1 -project example-210412 You can create dataproc cluster using GCP console portal or through CLI. We have kept hive data on GCS(We had to point the location to GCS bucket while creating table) Create Dataproc Cluster Destination port on Dataproc VPC firewall to be opened.Network connectivity should be opened between source(RM Studio machine) & VPC which has Dataproc installed.Containerization is not mandatory, you can run in standalone mode too. Run Rapid Miner AI Hub as containers inside autopilot GKE on GCP.Install Rapid Miner Studio on a windows workstation.We have not enabled Kerberos in the cluster as there was an issue of generating delegation token. Create Dataproc Cluster(GCE instance based not server less) on GCP. ![]() Make sure you have access in GCP cloud & you have authorization to create Dataproc cluster, VM machine, GCS bucket, KMS access, VPC admin rights, IAM admin rights. ![]() In this article, we’ll explain how we established connectivity from Rapid Miner Studio to Dataproc, what are the settings required, what are the challenges we faced and how we resolved those challenges. As a part of modernization of Hadoop platform, we decided to explore the capability of Dataproc as well as possibility of running ETL or AI models inside Dataproc while jobs will be triggered from Rapid Miner studio, workflow code compilation will be taking place in Rapid Miner AI Hub. Connecting Rapid Miner Studio to GCP Dataproc on GCE OverviewĮstablishing connectivity between Rapid Miner Studio & GCP Dataproc is challenging when both vendors never test such scenario.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |