Nuffnang

Monday 10 August 2015

Hadoop Tutorial



To understand Hadoop, you have to understand two fundamental things about it. Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files.Mainstream business users don't need to know how Hadoop works.But they do need to understand that the constraints they once had on storing and processing data are removed when Hadoop is installed.the business can start thinking big again when it comes to data.There is less confusion than there was 12 months ago. Executives just know that it is a big data technology, and that is enough for them.

The second characteristic of Hadoop is its ability to process that data, or at least (provide) a framework for processing that data. That's called MapReduce. Moving data over a network can be very, very slow, especially for really large data sets. Imagine if you're opening a really, really big file on your laptop, it takes a long, long time. It takes much longer than if it's a short, tiny file.

Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.

Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.

The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X.

Hadoop Video Tutorial from http://onlinetrainingcources.blogspot.in/

- The State of Data
- Hadoop
- Series Layout

- Hadoop Core
- Hadoop Projects
- Hadoop Incubator
- Stack Implementation

- HAFS Architecture
- HDFS Internals
- HDFS Interaction

- MapReduce Architecture
- MapReduce Internals
- MapReduce Example

- Installation Overview
- Installing Hadoop
- Hadoop Daemons Stuff

- Cluster Configurations
- Configuring Masters
- Configuring Slaves
- Cluster Stuff

- Hadoop Troubleshooting
- Hadoop Administration
- Hadoop Optimization

- Data Data Data
- HDFS Interacton
- HDFS Management
- Upgrade Process
- Rack Ayareness

- Development overview
- Configuring IDE Projects
- Writing Testing Jobs
- Running Jobs Against Clusters

- Pig Overview
- Pig vs SQL
- Pig Latin
- Installing PIG

- Loading storing
- Filter Transform
- Grouping Sorting
- Combining Splitting
- User Defined Function
- Debugging/Diagnostics

- Hive Overview
- Hive QL-Overview
- Hive Installation
- Hive Example

- Creating Tables
- Loading Data
- Creating View
- Creating Indexes

- HBase Overview
- HBase Architecture
- HBase Installation
- HBase Admin Test

- HBase Client Loading Overview
- Fully Distributed HBase Configuration
- Loading HBase
- HBase Data Access

- Zookeper Overview
- Zookeper Architecture
- Zookeper Installation

- Sqoop Overview
- Sqoop Installation
- Importing Data
- Exporting Data

- Cloudera CDH Overview
- Getting Started with Cloudera CDH VM
- Cloudera CDH VM walkthrough

- Amazon EMR Overview
- Loading S3
- Running EMR LOB Flows

- Microsof HDInsight Overview
- Provisioning An HDInsight Cluster
- Administering HDInsight Cluster Running Jobs

3 comments:

  1. The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training
    Thank you for sharing Such a good tutorials on Hadoop

    ReplyDelete
  2. This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information. Keep it up. Keep blogging. Looking to reading your next post. Hank Voight Jacket

    ReplyDelete