written 5.7 years ago by | modified 2.8 years ago by |
The components of ecosystem are as follows:
1) HBase
Open source, distributed, versioned, column oriented store.
It is based on Google's Big Table.
2) Hive
provides a warehouse structure for other Hadoop input sources and SQL like access for data in HDFS.
Hives query language, HiveQL, complies to map reduce and allow user defined functions.
3) Pig
- pig is a run time environment that allows users to execute map reduce on a Hadoop cluster.
4) Sqoop
- sqoop is a tool which transfers data in both ways between relational systems and HDFS or other Hadoop data store like Hive or HBase.
5) Oozie
It is a job coordinator and workflow manager for jobs executed in Hadoop.
It is integrated with the rest of the Apache Hadoop stack.
6) Mahout
It is a scalable machine learning and data mining library.
Algorithms here can be executed in a distributed fashion.
7) Zoo keeper
- It is a distributed service with master and slave nodes for storing and maintaining configuration information, naming, providing distributed synchronization and providing group services.
Hadoop = physical Architecture.
The data in organizations is stored on cloud to provide ease of access to user.
combining processor based servers and storage, along with networking resources used in cloud environment, with big data processing tools such as Apache Hadoop software, provides the high performance computing power needed to analyze vast amounts of data efficiently and cost effectively.
Hadoop compatible file system provide in cation awareness for effective scheduling of work.
Hadoop application uses this information to find the data node and seen the task.
A small Hadoop cluster includes a single master and multiple worker nodes.
master node consists of job tracker, task tracker, name node and data node.
A worker node consists of data node and task tracks.
Limitations of Hadoop
* Security Concerns:
- It is disabled by default due to sheer complexity.
* Vulnerable by Nature:
- written in Java, the Hadoop is vulnerable by nature
* Not fit for small data
* potential stability issues:
- It is a open source platform. so finding and using a stable version is a challenge.
* General limitations:
- There are many other technologies available for big data other than Hadoop.
Closed
This question is currently not accepting any new answers because Already Answered
If you believe something is wrong, please comment below for reconsideration.