Hadoop Architecture Quick Insight

Hadoop

The layer supporting the strorage layer—that is, the physical infrastructure—is fundamental to the operation and
scalability of big data architecture. In fact, the availability of a robust and inexpensive physical infrastructure has
triggered the emergence of big data as such an important trend. To support unanticipated or unpredictable volume,
velocity, or variety of data, a physical infrastructure for big data has to be different than that for traditional data.

The Hadoop physical infrastructure layer (HPIL) is based on a distributed computing model. This means that
data can be physically stored in many different locations and linked together through networks and a distributed file
system. It is a “share-nothing” architecture, where the data and the functions required to manipulate it reside together
on a single node. Like in the traditional client server model, the data no longer needs to be transferred to a monolithic
server where the SQL functions are applied to crunch it. Redundancy is built into this infrastructure because you are
dealing with so much data from so many different sources.

 

Traditional enterprise applications are built based on vertically scaling hardware and software. Traditional enterprise
architectures are designed to provide strong transactional guarantees, but they trade away scalability and are
expensive. Vertical-scaling enterprise architectures are too expensive to economically support dense computations
over large scale data. Auto-provisioned, virtualized data center resources enable horizontal scaling of data platforms
at significantly reduced prices. Hadoop and HDFS can manage the infrastructure layer in a virtualized cloud
environment (on-premises as well as in a public cloud) or a distributed grid of commodity servers over a fast gigabit
network.
A simple big data hardware configuration using commodity servers is illustrated below

Typical big data hardware topology

Typical big data hardware topology

 

The configuration pictured includes the following components: N commodity servers (8-core, 24 GBs RAM,
4 to 12 TBs, gig-E); 2-level network, 20 to 40 nodes per rack.

 

There are few more nice examples to read through like Hadoop from an Infrastructure Perspective article whcih sheds light on the whole architecture of Hadoop. For further readings you can download the Hadoop architecture document by Apache.

There is also a nice tutorial for learning Hadoop provided by Yahoo.

 

In case of any ©Copyright or missing credits issue please check CopyRights page for faster resolutions.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.