Recently, we faced a unique challenge â setup DevOps and management for a relatively complex Hadoop cluster on the Amazon EC2 Cloud. The obvious choice was to use a configuration management tool. Having extensively used Opscodeâs Chef and given the flexibility and extensibility Chef provides; it was an obvious choice. While looking around for the best practices to manage a hadoop cluster using Chef, we stumbled upon: Ironfan What is Ironfan? In short Ironfan, open-souced by InfoChimps provides an abstraction on top of Chef, allowing users to easily provision, deploy and manage a cluster of servers â be it a simple web application or a complex Hadoop cluster. After a few experiments, we were convinced that Ironfan was the right thing to use as it simplifies a lot of configuration avoiding repetition while retaining the goodness of Chef. This blog shows how easy it is to setup and manage a Hadoop cluster using Ironfan. Pre-requisties: Chef Account (Hosted or Private) with knife.rb setup correctly on your client machine. Ruby setup (using RVM or otherwise) Installation: Now you can install IronFan on your machine using the steps mentioned here. Once you have all the packages setup correctly, perform these sanity checks: Ensure that the environment variable CHEF_USERNAME is your Chef Server username (unless your USER environment variable is the same as your Chef username) Ensure the the environment variable CHEF_HOMEBASE points to the location which contains the expanded out knife.rb ~/.chef should be a symbolic link to your knife directory in the CHEF_HOMEBASE Your knife/knife.rb file is not modified. Your Chef user PEM file should be in knife/credentials/{username}.pem Your organizationâs Chef validator PEM file should be in knife/credentials/{organization}-validator.pem Your knife/credentials/knife-{organization}.rb file Should contain your Chef organization Should contain the chef_server_url Should contain the validation_client_name Should contain path to validation_key Should contain the aws_access_key_id/ aws_secret_access_key Should contain an AMI ID of an AMI youâd like to be able to boot in ec2_image_info Finally in the homebase rename the example_clusters directory to clusters. These are sample clusters than comes with Ironfan. Perform a knife cluster list command : $ knife cluster list Cluster Path: /.../homebase/clusters +----------------+-------------------------------------------------------+ | cluster | path | +----------------+-------------------------------------------------------+ | big_hadoop | /.../homebase/clusters/big_hadoop.rb | | burninator | /.../homebase/clusters/burninator.rb | ... Defining Cluster: Now lets define a cluster. A Cluster in IronFan is defined by a single file which describes all the configurations essential for a cluster. You can customize your cluster spec as follows: Define cloud provider settings Define base roles Define various facets Defining facet specific roles and recipes. Override properties of a particular facet server instance. Defining cloud provider settings: IronFan currently supports AWS and Rackspace Cloud providers. We will take an example of AWS cloud provider. For AWS you can provide config information like: Region, in which the servers will be deployed. Availibility zone to be used. EBS backed or Instance-Store backed servers Base Image(AMIs) to be used to spawn servers Security zone with the allowed port range. Defining Base Roles: You can define the global roles for a cluster. These roles will be applied to all servers unless explicitly overridden for any particular facet or server. All the available roles are defined in $CHEF_HOMEBASE/roles directory. You can create a custom role and use it in your cluster config. Defining Environment: Environments in Chef provide a mechanism for managing different environments such as production, staging, development, and testing, etc with one Chef setup (or one organization on Hosted Chef). With environments, you can specify per environment run lists in roles, per environment cookbook versions, and environment attributes. The available environments can be found in $CHEF_HOMEBASE/environments directory. Custom environments can be created and used. Ironfan.cluster 'my_first_cluster' do # Enviornment under which chef nodes will be placed environment :dev # Global roles for all servers role :systemwide role :ssh # Global ec2 cloud settings cloud(:ec2) do permanent true region 'us-east-1' availability_zones ['us-east-1c', 'us-east-1d'] flavor 't1.micro' backing 'ebs' image_name 'ironfan-natty' chef_client_script 'client.rb' security_group(:ssh).authorize_port_range(22..22) mount_ephemerals end Defining Facets: Facets are group of servers within a cluster. Facets share common attributes and roles. For example, in your cluster you have 2 app servers and 2 database servers then you can group the app servers under the app_server facet and the database servers under the database facet. Defining Facet specific roles and recipes: You can define roles and recipes particular to a facet. Even the global cloud settings can be overridden for a particular facet. facet :master do instances 1 recipe ânginxâ cloud(:ec2) do flavor âm1.smallâ security_group(:web) do authorize_port_range(80..80) authorize_port_range(443..443) role :hadoop_namenode role :hadoop_secondarynn role :hadoop_jobtracker role :hadoop_datanode role :hadoop_tasktracker end facet :worker do instances 2 role :hadoop_datanode role :hadoop_tasktracker end In the above example we have defined a facet for Hadoop master node and a facet for worker node. The number of instances of master is set to 1 and that of worker is set to 2. Each master and worker facets have been assigned a set of roles. For master facet we have overridden the ec2 flavor settings as m1.medium. Also the security group for the master node is set to accept incoming traffic on port 80 and 443. Cluster Management: Now that we are ready with the cluster configuration lets get a hands on cluster management. All the cluster configuration files are placed under the $CHEF_HOMEBASE/clusters directory. We will place our new config file as hadoop_job001_cluster.rb. Now our new cluster should be listed in the cluster list. List Clusters: $ knife cluster list Cluster Path: /.../homebase/clusters +-------------+-------------------------+ | cluster | path | +-------------+-------------------------+ hadoop_job001 HOMEBASE/clusters/hadoop_job001_cluster.rb +-------------+-------------------------+ Show Cluster Configuration: $ knife cluster show hadoop_job001 Inventorying servers in hadoop_job001 cluster, all facets, all servers my_first_cluster: Loading chef my_first_cluster: Loading ec2 my_first_cluster: Reconciling DSL and provider information +-----------------------------+-------+-------------+----------+------------+-----+ | Name | Chef? | State | Flavor | AZ | Env | +-----------------------------+-------+-------------+----------+------------+-----+ | hadoop_job001-master-0 | no | not running | m1.small | us-east-1c | dev | | hadoop_job001-client-0 | no | not running | t1.micro | us-east-1c | dev | | hadoop_job001-client-1 | no | not running | t1.micro | us-east-1c | dev | +-----------------------------+-------+-------------+----------+------------+-----+ Launch Cluster: Launch Whole Cluster: $ knife cluster launch hadoop_job001 Loaded information for 3 computer(s) in cluster my_first_cluster +-----------------------------+-------+---------+----------+------------+-----+------------+--------- -------+----------------+------------+ | Name | Chef? | State | Flavor | AZ | Env | MachineID | Public IP | Private IP | Created On | +-----------------------------+-------+---------+----------+------------+-----+------------+----------------+----------------+------------+ | hadoop_job001-master-0 | yes | running | m1.small | us-east-1c | dev | i-c9e117b5 | 101.23.157.51 | 10.106.57.77 | 2012-12-10 | | hadoop_job001-client-0 | yes | running | t1.micro | us-east-1c | dev | i-cfe117b3 | 101.23.157.52 | 10.106.57.78 | 2012-12-10 | | hadoop_job001-client-1 | yes | running | t1.micro | us-east-1c | dev | i-cbe117b7 | 101.23.157.52 | 10.106.57.79 | 2012-12-10 | +-----------------------------+-------+---------+----------+------------+-----+------------+----------------+----------------+------------+ Launch a single instance of a facet: $ knife cluster launch hadoop_job001 master 0 Launch all instances of a facet: $ knife cluster launch hadoop_job001 worker Stop Whole Cluster: $ knife cluster stop hadoop_job001 Stop a single instance of a facet: $ knife cluster stop hadoop_job001 master 0 Stop all instances of a facet: $ knife cluster stop hadoop_job001 Setting up a Hadoop cluster and managing it cannot get easier than this! Just to re-cap, Ironfan, open-souced by InfoChimps, is a systems provisioning and deployment tool which automates entire systems configuration to enable the entire Big Data stack, including tools for data ingestion, scraping, storage, computation, and monitoring. There is another tool that we are exploring for Hadoop cluster management â Apache Ambari. We will post our findings and comparisons soon, stay tuned!
Aziro Marketing
Amazon CloudWatch is an Amazon Web Services utility allowing monitoring of various components like EC2 instances, EBS volumes and the Elastic Load Balancer. For EC2 instances, we can monitor CPUUtilization, DiskReadBytes, DiskReadOps, DiskWriteBytes, NetworkIn and NetworkOut. More often than not, end-users would want to monitor more parameters than the ones available. eg. Free Memory, Free Swap and so on. Amazon CloudWatch provides custom metrics to help circumvent the problem. One can simply define a custom metric based on each oneâs need and continuously feed it with data using a simple bash or python script running a while loop. Letâs take an example of Free Memory. The aim is to define a custom metric for Free Memory and continuously feed data to the metric from the machine that needs to be monitored. Install install/setup the AWS cloud-watch Command line fromhttp://aws.amazon.com/developertools/2534 Setup the API as you would for any AWS Command line tool. export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre/ export AWS_CLOUDWATCH_HOME=/opt/cloudwatch/ or the location where you have unzipped the utility export PATH=$AWS_CLOUDWATCH_HOME/bin:$PATH; To define a new metric eg. FreeMemory, ubuntu@domU-12-31-32-0B-01-A7:~$ mon-put-data -m âFreeMemoryâ ânamespace Clogeny âdimensions âinstance=i-f23233,servertype=MongoDBâ âvalue 100 -u Bytes Now this command will create a FreeMemory metric in another 20 minutes. The namespace and dimensions can be customized as per your needs. For now we have chosen a dummy value (but will eventually contain valid data) and the unit we choose is Bytes. There is more information about options and explanations about the Cloud Watch API athttp://docs.amazonwebservices.com/AmazonCloudWatch/latest/DeveloperGuide/index.html?CLIReference.html Once the metric is created, we need to continuously feed data to this metric. /proc/meminfo contains information about the current memory status of the system and can be used as the data source Here is a simple python script that will feed FreeMemory with the required data import commands ret, cmdout = commands.getstatusoutput(âcat /proc/meminfo | grep -e MemFreeâ) free_mem = str(int(cmdout.split()[1]) * 1024) # This simply fetches the Free memory from /proc/meminfo and converts it to bytes. ret,cmdout= commands.getstatusoutput(âmon-put-data -m âFreeMemoryâ ânamespace Clogeny âdimensions âinstance=i-f23233,servertype=MongoDBâ âvalue â + free_mem + â â u Bytesâ) # On executing command, the data will be populated on the CloudWatch Dashboard. # Run these above commands in a loop, and you have your own little agent providing Free memory metric for your machine. After running the tool for a while, you can see a similar graph in the AWS CloudWatch Console Deleting a custom Metric A custom metric cannot be explicitly deleted. If the metric remains unused for 2 weeks, it gets automatically deleted. Costing $0.50 per metric per month Summary You can see how easy it is to add a custom metric. In this example we have shown how to add a FreeMemory metric. There are several other useful metrics such FreeSwap, ProcessAvailability, DiskSpace, etc that can also be added. Aziro (formerly MSys Technologies) as a leading AWS cloud services provider, can help you do that. .resourceSingleInnerRight > h5.blueTitle{text-align:left;}.filledCheckboxes input[type="checkbox"]{opacity:0;display :none;}.multiStepFormBody button.close{z-index:99;}
Amazon EC2 has recently released PV-GRUB loader supported kernels that allow one to boot their kernels. This PV-GRUB loader simply chain-boots the kernel provided in the associated AMI (Amazon Machine Image). This results in your instance running the kernel in the AMI instead of the kernel specified in the boot process. This is hugely helpful feature for folks who want load their own customized kernels into EC2âs virtual machines. This article talks about we successfully booted a customized (extra patches) RHEL/CENTOS 5.5 kernel on EC2: Pre-requisites: Amazon EC2 account (obviously!) Knowledge of running EC2 instances/bundling EC2 images using ElasticFox or command line ec2 tools. The kernel you want to build. The patches you want to apply. NOTE: Youâll might think that, wow! with the feature i can boot any damn kernel in the world. Well chances are you can but what Amazon says is that they are 100% sure certain these kernels (mentioned below) definitely boot, rest need to test their luck:
Amazon Elastic Compute Cloud (EC2) provides scalable virtual private servers using Xen. The instances running on Xen sync their wall clock periodically with the underlying hypervisor. For changing the datetime settings, few extra configurations are required. On a simple Linux Machine the date and time can be simply changed by stopping ntpd service and setting the date as: # date -s â2 OCT 2006 18:00:00â But on Xen server based virtual instance itâs not this simple! The above command will not throw any error but will neither change the date. In order to change the date on Xen server based instance first, you need to set the wall clock to run independently from Xen. This can be done simply typing the command: echo 1 > /proc/sys/xen/independent_wallclock To keep the setting between reboots, just add the following to the end of the file â /etc/sysctl.conf : xen.independent_wallclock = 1 If you want to re-sync the wall clock with Xen, simply type the command: echo 0 > /proc/sys/xen/independent_wallclock # date -s â2 OCT 2006 18:00:00â But on Xen server based virtual instance itâs not this simple! The above command will not throw any error but will neither change the date. In order to change the date on Xen server based instance first, you need to set the wall clock to run independently from Xen. This can be done simply typing the command: echo 1 > /proc/sys/xen/independent_wallclock To keep the setting between reboots, just add the following to the end of the file â /etc/sysctl.conf : xen.independent_wallclock = 1 If you want to re-sync the wall clock with Xen, simply type the command: echo 0 > /proc/sys/xen/independent_wallclock
Big things at Aziro often start small - a message, an idea, a quick hello. A real human reads every enquiry, and a simple conversation can turn into a real opportunity.Start yours with us.
Talk to us
+1 844 415 0777
Drop us a line at
info@aziro.com