Hadoop is a software framework that support data
How Hadoop can help you?
Hadoop is a software framework that offers support to data-intensive distributed
applications. It is open-source software that enables applications to work with
multiple nodes and petabytes of data.
It is the most
popular Big Data technology that was developed in the lines of Google’s
MapReduce and Google File System (GFS) papers. It provides the resources
required for using an enormous cluster of computers to store large amount of
data which can be operated on the parallel.
What problemscan Hadoop solve:
The Hadoop platform was designed to solve problems
where you have a lot of data — perhaps a mixture of complex and structured data
— and it doesn’t fit nicely into tables. It’s for situations where you want to
run analytics that are deep and computationally extensive, like clustering and
targeting. That’s exactly what Google was doing when it was indexing the web
and examining user behavior to improve performance algorithms.
Hadoop applies to a
bunch of markets. In finance, if you want to do accurate portfolio evaluation
and risk analysis, you can build sophisticated models that are hard to jam into
a database engine. But Hadoop can handle it. In online retail, if you want to
deliver better search answers to your customers so they’re more likely to buy
the thing you show them, that sort of problem is well addressed by the platform
Google built.
Why use
Hadoop?
Hadoop is used where
there is a large amount of data generated and your business requires insights
from that data. The power of Hadoop lies in its framework, as virtually most of
the software can be plugged into it and can be used for data visualization. It
can be extended from one system to thousands of systems in a cluster and these
systems could be low end commodity systems. Hadoop does not depend upon hardware
for high availability. The two primary reasons to support the question “Why use
Hadoop” –
·
The cost savings with Hadoop
are dramatic when compared to the legacy systems.
·
It has a robust community
support that is evolving over time with novel advancements.
How does
Hadoop work?
As mentioned in the
prequel, Hadoop is an ecosystem of
libraries, and each library has its own dedicated tasks to perform. HDFS writes
data once to the server and then reads and reuses it many times. When comparing
it with continuous multiple read and write actions of other file systems, HDFS
exhibits speed with which Hadoop works and hence is considered as a perfect
solution to deal with voluminous variety of data.
Job Tracker is the master
node which manages all the Task Tracker slave nodes and executes the jobs.
Whenever some data is required, request is sent to NameNode which is the master
node (smart node of the cluster) of HDFS and manages all the DataNode slave
nodes. The request is passed on all the DataNode which serves the required
data. There is concept of Heartbeat in Hadoop, which is sent by all the slave
nodes to their master nodes, which is an indication that the slave node is
alive.
Hadoop Uses:
Hadoop uses apply to
diverse markets- whether a retailer wants to deliver effective search answers
to a customer’s query or a financial firm wants to do accurate portfolio
evaluation and risk analysis, Hadoop can well address all these problems.
Today, the whole world is crazy for social networking and online shopping. So,
let’s take a look at Hadoop uses from these two perspectives.
Hadoop is used
extensively at Facebook that stores close to 250 billion photos and 350 million
new photos being uploaded every day. Facebook uses Hadoop in multiple ways-
·
Facebook uses Hadoop and
Hive to generate reports for advertisers that help them track the success of
their advertising campaigns.
·
Facebook Messaging apps runs
on top of Hadoop’s NoSQL database- HBase
·
Facebook uses Hive Hadoop
for faster querying on various graph tools.
Comments