Tag Archive

Below you'll find a list of all posts that have been tagged as "Amazon EC2"
blogImage

10 Steps to Setup and Manage a Hadoop Cluster Using Ironfan

Recently, we faced a unique challenge – setup DevOps and management for a relatively complex Hadoop cluster on the Amazon EC2 Cloud. The obvious choice was to use a configuration management tool. Having extensively used Opscode’s Chef and given the flexibility and extensibility Chef provides; it was an obvious choice. While looking around for the best practices to manage a hadoop cluster using Chef, we stumbled upon: Ironfan What is Ironfan? In short Ironfan, open-souced by InfoChimps provides an abstraction on top of Chef, allowing users to easily provision, deploy and manage a cluster of servers – be it a simple web application or a complex Hadoop cluster. After a few experiments, we were convinced that Ironfan was the right thing to use as it simplifies a lot of configuration avoiding repetition while retaining the goodness of Chef. This blog shows how easy it is to setup and manage a Hadoop cluster using Ironfan. Pre-requisties: Chef Account (Hosted or Private) with knife.rb setup correctly on your client machine. Ruby setup (using RVM or otherwise) Installation: Now you can install IronFan on your machine using the steps mentioned here. Once you have all the packages setup correctly, perform these sanity checks: Ensure that the environment variable CHEF_USERNAME is your Chef Server username (unless your USER environment variable is the same as your Chef username) Ensure the the environment variable CHEF_HOMEBASE points to the location which contains the expanded out knife.rb ~/.chef should be a symbolic link to your knife directory in the CHEF_HOMEBASE Your knife/knife.rb file is not modified. Your Chef user PEM file should be in knife/credentials/{username}.pem Your organization’s Chef validator PEM file should be in knife/credentials/{organization}-validator.pem Your knife/credentials/knife-{organization}.rb file Should contain your Chef organization Should contain the chef_server_url Should contain the validation_client_name Should contain path to validation_key Should contain the aws_access_key_id/ aws_secret_access_key Should contain an AMI ID of an AMI you’d like to be able to boot in ec2_image_info Finally in the homebase rename the example_clusters directory to clusters. These are sample clusters than comes with Ironfan. Perform a knife cluster list command : $ knife cluster list Cluster Path: /.../homebase/clusters +----------------+-------------------------------------------------------+ | cluster | path | +----------------+-------------------------------------------------------+ | big_hadoop | /.../homebase/clusters/big_hadoop.rb | | burninator | /.../homebase/clusters/burninator.rb | ... Defining Cluster: Now lets define a cluster. A Cluster in IronFan is defined by a single file which describes all the configurations essential for a cluster. You can customize your cluster spec as follows: Define cloud provider settings Define base roles Define various facets Defining facet specific roles and recipes. Override properties of a particular facet server instance. Defining cloud provider settings: IronFan currently supports AWS and Rackspace Cloud providers. We will take an example of AWS cloud provider. For AWS you can provide config information like: Region, in which the servers will be deployed. Availibility zone to be used. EBS backed or Instance-Store backed servers Base Image(AMIs) to be used to spawn servers Security zone with the allowed port range. Defining Base Roles: You can define the global roles for a cluster. These roles will be applied to all servers unless explicitly overridden for any particular facet or server. All the available roles are defined in $CHEF_HOMEBASE/roles directory. You can create a custom role and use it in your cluster config. Defining Environment: Environments in Chef provide a mechanism for managing different environments such as production, staging, development, and testing, etc with one Chef setup (or one organization on Hosted Chef). With environments, you can specify per environment run lists in roles, per environment cookbook versions, and environment attributes. The available environments can be found in $CHEF_HOMEBASE/environments directory. Custom environments can be created and used. Ironfan.cluster 'my_first_cluster' do # Enviornment under which chef nodes will be placed environment :dev # Global roles for all servers role :systemwide role :ssh # Global ec2 cloud settings cloud(:ec2) do permanent true region 'us-east-1' availability_zones ['us-east-1c', 'us-east-1d'] flavor 't1.micro' backing 'ebs' image_name 'ironfan-natty' chef_client_script 'client.rb' security_group(:ssh).authorize_port_range(22..22) mount_ephemerals end Defining Facets: Facets are group of servers within a cluster. Facets share common attributes and roles. For example, in your cluster you have 2 app servers and 2 database servers then you can group the app servers under the app_server facet and the database servers under the database facet. Defining Facet specific roles and recipes: You can define roles and recipes particular to a facet. Even the global cloud settings can be overridden for a particular facet. facet :master do instances 1 recipe ‘nginx’ cloud(:ec2) do flavor ‘m1.small’ security_group(:web) do authorize_port_range(80..80) authorize_port_range(443..443) role :hadoop_namenode role :hadoop_secondarynn role :hadoop_jobtracker role :hadoop_datanode role :hadoop_tasktracker end facet :worker do instances 2 role :hadoop_datanode role :hadoop_tasktracker end In the above example we have defined a facet for Hadoop master node and a facet for worker node. The number of instances of master is set to 1 and that of worker is set to 2. Each master and worker facets have been assigned a set of roles. For master facet we have overridden the ec2 flavor settings as m1.medium. Also the security group for the master node is set to accept incoming traffic on port 80 and 443. Cluster Management: Now that we are ready with the cluster configuration lets get a hands on cluster management. All the cluster configuration files are placed under the $CHEF_HOMEBASE/clusters directory. We will place our new config file as hadoop_job001_cluster.rb. Now our new cluster should be listed in the cluster list. List Clusters: $ knife cluster list Cluster Path: /.../homebase/clusters +-------------+-------------------------+ | cluster | path | +-------------+-------------------------+ hadoop_job001 HOMEBASE/clusters/hadoop_job001_cluster.rb +-------------+-------------------------+ Show Cluster Configuration: $ knife cluster show hadoop_job001 Inventorying servers in hadoop_job001 cluster, all facets, all servers my_first_cluster: Loading chef my_first_cluster: Loading ec2 my_first_cluster: Reconciling DSL and provider information +-----------------------------+-------+-------------+----------+------------+-----+ | Name | Chef? | State | Flavor | AZ | Env | +-----------------------------+-------+-------------+----------+------------+-----+ | hadoop_job001-master-0 | no | not running | m1.small | us-east-1c | dev | | hadoop_job001-client-0 | no | not running | t1.micro | us-east-1c | dev | | hadoop_job001-client-1 | no | not running | t1.micro | us-east-1c | dev | +-----------------------------+-------+-------------+----------+------------+-----+ Launch Cluster: Launch Whole Cluster: $ knife cluster launch hadoop_job001 Loaded information for 3 computer(s) in cluster my_first_cluster +-----------------------------+-------+---------+----------+------------+-----+------------+--------- -------+----------------+------------+ | Name | Chef? | State | Flavor | AZ | Env | MachineID | Public IP | Private IP | Created On | +-----------------------------+-------+---------+----------+------------+-----+------------+----------------+----------------+------------+ | hadoop_job001-master-0 | yes | running | m1.small | us-east-1c | dev | i-c9e117b5 | 101.23.157.51 | 10.106.57.77 | 2012-12-10 | | hadoop_job001-client-0 | yes | running | t1.micro | us-east-1c | dev | i-cfe117b3 | 101.23.157.52 | 10.106.57.78 | 2012-12-10 | | hadoop_job001-client-1 | yes | running | t1.micro | us-east-1c | dev | i-cbe117b7 | 101.23.157.52 | 10.106.57.79 | 2012-12-10 | +-----------------------------+-------+---------+----------+------------+-----+------------+----------------+----------------+------------+ Launch a single instance of a facet: $ knife cluster launch hadoop_job001 master 0 Launch all instances of a facet: $ knife cluster launch hadoop_job001 worker Stop Whole Cluster: $ knife cluster stop hadoop_job001 Stop a single instance of a facet: $ knife cluster stop hadoop_job001 master 0 Stop all instances of a facet: $ knife cluster stop hadoop_job001 Setting up a Hadoop cluster and managing it cannot get easier than this! Just to re-cap, Ironfan, open-souced by InfoChimps, is a systems provisioning and deployment tool which automates entire systems configuration to enable the entire Big Data stack, including tools for data ingestion, scraping, storage, computation, and monitoring. There is another tool that we are exploring for Hadoop cluster management – Apache Ambari. We will post our findings and comparisons soon, stay tuned!

Aziro Marketing

blogImage

How to Add Custom Metrics in Amazon CloudWatch?

Amazon CloudWatch is an Amazon Web Services utility allowing monitoring of various components like EC2 instances, EBS volumes and the Elastic Load Balancer. For EC2 instances, we can monitor CPUUtilization, DiskReadBytes, DiskReadOps, DiskWriteBytes, NetworkIn and NetworkOut. More often than not, end-users would want to monitor more parameters than the ones available. eg. Free Memory, Free Swap and so on. Amazon CloudWatch provides custom metrics to help circumvent the problem. One can simply define a custom metric based on each one’s need and continuously feed it with data using a simple bash or python script running a while loop. Let’s take an example of Free Memory. The aim is to define a custom metric for Free Memory and continuously feed data to the metric from the machine that needs to be monitored. Install install/setup the AWS cloud-watch Command line fromhttp://aws.amazon.com/developertools/2534 Setup the API as you would for any AWS Command line tool. export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre/ export AWS_CLOUDWATCH_HOME=/opt/cloudwatch/ or the location where you have unzipped the utility export PATH=$AWS_CLOUDWATCH_HOME/bin:$PATH; To define a new metric eg. FreeMemory, ubuntu@domU-12-31-32-0B-01-A7:~$ mon-put-data -m “FreeMemory” –namespace Clogeny –dimensions “instance=i-f23233,servertype=MongoDB” –value 100 -u Bytes Now this command will create a FreeMemory metric in another 20 minutes. The namespace and dimensions can be customized as per your needs. For now we have chosen a dummy value (but will eventually contain valid data) and the unit we choose is Bytes. There is more information about options and explanations about the Cloud Watch API athttp://docs.amazonwebservices.com/AmazonCloudWatch/latest/DeveloperGuide/index.html?CLIReference.html Once the metric is created, we need to continuously feed data to this metric. /proc/meminfo contains information about the current memory status of the system and can be used as the data source Here is a simple python script that will feed FreeMemory with the required data import commands ret, cmdout = commands.getstatusoutput(“cat /proc/meminfo | grep -e MemFree”) free_mem = str(int(cmdout.split()[1]) * 1024) # This simply fetches the Free memory from /proc/meminfo and converts it to bytes. ret,cmdout= commands.getstatusoutput(“mon-put-data -m “FreeMemory” –namespace Clogeny –dimensions “instance=i-f23233,servertype=MongoDB” –value ” + free_mem + ” – u Bytes”) # On executing command, the data will be populated on the CloudWatch Dashboard. # Run these above commands in a loop, and you have your own little agent providing Free memory metric for your machine. After running the tool for a while, you can see a similar graph in the AWS CloudWatch Console Deleting a custom Metric A custom metric cannot be explicitly deleted. If the metric remains unused for 2 weeks, it gets automatically deleted. Costing $0.50 per metric per month Summary You can see how easy it is to add a custom metric. In this example we have shown how to add a FreeMemory metric. There are several other useful metrics such FreeSwap, ProcessAvailability, DiskSpace, etc that can also be added. Aziro (formerly MSys Technologies) as a leading AWS cloud services provider, can help you do that. .resourceSingleInnerRight > h5.blueTitle{text-align:left;}.filledCheckboxes input[type="checkbox"]{opacity:0;display :none;}.multiStepFormBody button.close{z-index:99;}

Aziro Marketing

blogImage

How to Change the Date and Time on Amazon’s EC2 Instance

Amazon Elastic Compute Cloud (EC2) provides scalable virtual private servers using Xen. The instances running on Xen sync their wall clock periodically with the underlying hypervisor. For changing the datetime settings, few extra configurations are required. On a simple Linux Machine the date and time can be simply changed by stopping ntpd service and setting the date as: # date -s “2 OCT 2006 18:00:00” But on Xen server based virtual instance it’s not this simple! The above command will not throw any error but will neither change the date. In order to change the date on Xen server based instance first, you need to set the wall clock to run independently from Xen. This can be done simply typing the command: echo 1 > /proc/sys/xen/independent_wallclock To keep the setting between reboots, just add the following to the end of the file – /etc/sysctl.conf : xen.independent_wallclock = 1 If you want to re-sync the wall clock with Xen, simply type the command: echo 0 > /proc/sys/xen/independent_wallclock # date -s “2 OCT 2006 18:00:00” But on Xen server based virtual instance it’s not this simple! The above command will not throw any error but will neither change the date. In order to change the date on Xen server based instance first, you need to set the wall clock to run independently from Xen. This can be done simply typing the command: echo 1 > /proc/sys/xen/independent_wallclock To keep the setting between reboots, just add the following to the end of the file – /etc/sysctl.conf : xen.independent_wallclock = 1 If you want to re-sync the wall clock with Xen, simply type the command: echo 0 > /proc/sys/xen/independent_wallclock

Aziro Marketing

EXPLORE ALL TAGS
2019 dockercon
Advanced analytics
Agentic AI
agile
AI
AI ML
AIOps
Amazon Aws
Amazon EC2
Analytics
Analytics tools
AndroidThings
Anomaly Detection
Anomaly monitor
Ansible Test Automation
apache
apache8
Apache Spark RDD
app containerization
application containerization
applications
Application Security
application testing
artificial intelligence
asynchronous replication
automate
automation
automation testing
Autonomous Storage
AWS Lambda
Aziro
Aziro Technologies
big data
Big Data Analytics
big data pipeline
Big Data QA
Big Data Tester
Big Data Testing
bitcoin
blockchain
blog
bluetooth
buildroot
business intelligence
busybox
chef
ci/cd
CI/CD security
cloud
Cloud Analytics
cloud computing
Cloud Cost Optimization
cloud devops
Cloud Infrastructure
Cloud Interoperability
Cloud Native Solution
Cloud Security
cloudstack
cloud storage
Cloud Storage Data
Cloud Storage Security
Codeless Automation
Cognitive analytics
Configuration Management
connected homes
container
Containers
container world 2019
container world conference
continuous-delivery
continuous deployment
continuous integration
Coronavirus
Covid-19
cryptocurrency
cyber security
data-analytics
data backup and recovery
datacenter
data protection
data replication
data-security
data-storage
deep learning
demo
Descriptive analytics
Descriptive analytics tools
development
devops
devops agile
devops automation
DEVOPS CERTIFICATION
devops monitoring
DevOps QA
DevOps Security
DevOps testing
DevSecOps
Digital Transformation
disaster recovery
DMA
docker
dockercon
dockercon 2019
dockercon 2019 san francisco
dockercon usa 2019
docker swarm
DRaaS
edge computing
Embedded AI
embedded-systems
end-to-end-test-automation
FaaS
finance
fintech
FIrebase
flash memory
flash memory summit
FMS2017
GDPR faqs
Glass-Box AI
golang
GraphQL
graphql vs rest
gui testing
habitat
hadoop
hardware-providers
healthcare
Heartfullness
High Performance Computing
Holistic Life
HPC
Hybrid-Cloud
hyper-converged
hyper-v
IaaS
IaaS Security
icinga
icinga for monitoring
Image Recognition 2024
infographic
InSpec
internet-of-things
investing
iot
iot application
iot testing
java 8 streams
javascript
jenkins
KubeCon
kubernetes
kubernetesday
kubernetesday bangalore
libstorage
linux
litecoin
log analytics
Log mining
Low-Code
Low-Code No-Code Platforms
Loyalty
machine-learning
Meditation
Microservices
migration
Mindfulness
ML
mobile-application-testing
mobile-automation-testing
monitoring tools
Mutli-Cloud
network
network file storage
new features
NFS
NVMe
NVMEof
NVMes
Online Education
opensource
openstack
opscode-2
OSS
others
Paas
PDLC
Positivty
predictive analytics
Predictive analytics tools
prescriptive analysis
private-cloud
product sustenance
programming language
public cloud
qa
qa automation
quality-assurance
Rapid Application Development
raspberry pi
RDMA
real time analytics
realtime analytics platforms
Real-time data analytics
Recovery
Recovery as a service
recovery as service
rsa
rsa 2019
rsa 2019 san francisco
rsac 2018
rsa conference
rsa conference 2019
rsa usa 2019
SaaS Security
san francisco
SDC India 2019
SDDC
security
Security Monitoring
Selenium Test Automation
selenium testng
serverless
Serverless Computing
Site Reliability Engineering
smart homes
smart mirror
SNIA
snia india 2019
SNIA SDC 2019
SNIA SDC INDIA
SNIA SDC USA
software
software defined storage
software-testing
software testing trends
software testing trends 2019
SRE
STaaS
storage
storage events
storage replication
Storage Trends 2018
storage virtualization
support
Synchronous Replication
technology
tech support
test-automation
Testing
testing automation tools
thought leadership articles
trends
tutorials
ui automation testing
ui testing
ui testing automation
vCenter Operations Manager
vCOPS
virtualization
VMware
vmworld
VMworld 2019
vmworld 2019 san francisco
VMworld 2019 US
vROM
Web Automation Testing
web test automation
WFH

LET'S ENGINEER

Your Next Product Breakthrough

Book a Free 30-minute Meeting with our technology experts.

Aziro has been a true engineering partner in our digital transformation journey. Their AI-native approach and deep technical expertise helped us modernize our infrastructure and accelerate product delivery without compromising quality. The collaboration has been seamless, efficient, and outcome-driven.

Customer Placeholder
CTO

Fortune 500 company