Write the limitations of Hadoop

475views

written 2.9 years ago by

Krishna_Agrawal • 2.7k

• modified 2.9 years ago

Limitations of Hadoop :-

Issue with small files - Hadoop does not suit for small data. (HDFS) Hadoop distributed file system lacks the ability to efficiently support the random reading of small files because of its high capacity design. Small files are the major problem in HDFS. A small file is significantly smaller than the HDFS block size (default 128MB). If we are storing these huge numbers of small files, HDFS can’t handle this much of files, as HDFS is for working properly with a small number of large files for storing large data sets rather than a large number of small files. If there are too many small files, then the NameNode will get overload since it stores the namespace of HDFS.

Slow Processing Speed - In Hadoop, with a parallel and distributed algorithm, the MapReduce process large data sets. There are tasks that we need to perform: Map and Reduce and, MapReduce requires a lot of time to perform these tasks thereby increasing latency. Data is distributed and processed over the cluster in MapReduce which increases the time and reduces processing speed.

Support for Batch Processing only - Hadoop supports batch processing only, it does not process streamed data, and hence overall performance is slower. The MapReduce framework of Hadoop does not leverage the memory of the Hadoop cluster to the maximum.

No Real-time Data Processing - Apache Hadoop is for batch processing, which means it takes a huge amount of data in input, process it and produces the result. Although batch processing is very efficient for processing a high volume of data, depending on the size of the data that processes and the computational power of the system, an output can delay significantly. Hadoop is not suitable for Real-time data processing.

Latency - In Hadoop, MapReduce framework is comparatively slower, since it is for supporting different format, structure and huge volume of data. In MapReduce, Map takes a set of data and converts it into another set of data, where individual elements are broken down into key-value pairs and Reduce takes the output from the map as input and process further and MapReduce requires a lot of time to perform these tasks thereby increasing latency.

No Delta Iteration - Hadoop is not so efficient for iterative processing, as Hadoop does not support cyclic data flow(i.e. a chain of stages in which each output of the previous stage is the input to the next stage).

Security - Hadoop is challenging in managing the complex application. If the user doesn’t know how to enable a platform who is managing the platform, your data can be a huge risk. At storage and network levels, Hadoop is missing encryption, which is a major point of concern. Hadoop supports Kerberos authentication, which is hard to manage. HDFS supports access control lists (ACLs) and a traditional file permissions model. However, third-party vendors have enabled an organization to leverage Active Directory Kerberos and LDAP for authentication.

Not Easy to Use - In Hadoop, MapReduce developers need to hand code for each and every operation which makes it very difficult to work. MapReduce has no interactive mode, but adding one such as hive and pig makes working with MapReduce a little easier for adopters.

ADD COMMENT EDIT