![]() ![]() In the Database Connection section of the Kettle, select Connection Type as : PostgreSQL. Since Redshift is basically a PostgreSQL database, Kettle will access the cluster using the Postgresql database type. STEP 5: Using PDI/Kettle to Access RedshiftĪccessing Redshift using Kettle is straight forward once you have properly configured the cluster as mentioned above. Else if the cluster is for test purpose (as in my case) its OK to ignore the IP specification and accept all the IPs.Ĭlick here for more on authorizing the inbound rule. Ideally, if you are using the cluster for serious task or confidential data, it is recommended to use the Public IP address on which the cluster is defined in the Inbound rule. This will enable the clients or 3rd party tools to easily access the redshift cluster. I have added the SSH and ALL TCP type to be publicly accessible. In the above image, for the default VPC security group created during the Redshift cluster configuration, i have added few extra rules. You need to change the Inbound rule for the security group. There you will find the security groups that are available. Now Click on EC2 Dashboard (from the AWS Management Console) and select the Security Group Tab on the right. EC2 allows scalable deployment of applications by providing a web service through which a user can boot an Amazon Machine Image to create a virtual machine, which Amazon calls an “instance”, containing any software desired. You cannot connect to the Redshift database through clients if you are not allowing the IP and TCP port accessible to the security group.ĮC2 allows users to rent virtual computers on which to run their own computer applications. Step 4: Changing the Security User Group in EC2 Below is an image of a test cluster that i have created. This default configuration we need to tweak which is explained below.įor now, simple click on CONTINUE and finally you are done with the Single Node Redshift Cluster Configuration. AWS will provide a default configuration of the VPC. ![]() All the nodes are configured in the VPC by default. In the Additional Configurations section (after you hit “Continue” from step 2), you are asked to give choose VPC (Virtual Private Cloud), security details, encryption database, etc. In the above image the dc1.large node type is selected for a Single Node cluster with one compute nodes. Based upon your usage, you can select the node type from a list of different node types to get the memory and storage configurations. As for this blog, i am using a single node cluster type along with dc1.large node type. ![]() Once you are done with the cluster configuration, next step would be to define the nodes. Master Username and Password: The usual credentials of the cluster.Database Port: This is the port on which the database accepts the connection.Database Name: The name of your database.Cluster Identifier: The unique name you give to your cluster.Next you will asked to provide the below details like : Log into Redshift from the AWS Management Console and click on “ Launch Cluster“. After you provision your cluster, you can upload your data set and then perform data analysis queries. The first step to create a data warehouse is to launch a set of nodes, called an Amazon Redshift cluster. This is the home page for AWS and the place where you can view all the product services provided by Amazon. Simply provide all the basic information. In this blog, i will be explaining the steps to setup Amazon Redshift cluster atleast for one test instance and the process to connect it using PDI.įirst of all, visit Amazon AWS and register yourself into AWS account. Read the official redshift document here for more. Redshift provides you to auto-scale the database as your data grows so that its gets easier to focus on the analysis part of BI solutions. Redshift falls under the database section of Amazon Web Services (AWS) which is using PostgreSQL database for storing data. You can start from few hundred GB of data and scale upto petabyte or more. Amazon Redshift is a fully managed and highly scalable data-warehouse service in the cloud. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |