MySignalProcessing: Hadoop Tutorial

To understand Hadoop, you have to understand two fundamental things about it. Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files.Mainstream business users don't need to know how Hadoop works.But they do need to understand that the constraints they once had on storing and processing data are removed when Hadoop is installed.the business can start thinking big again when it comes to data.There is less confusion than there was 12 months ago. Executives just know that it is a big data technology, and that is enough for them.

The second characteristic of Hadoop is its ability to process that data, or at least (provide) a framework for processing that data. That's called MapReduce. Moving data over a network can be very, very slow, especially for really large data sets. Imagine if you're opening a really, really big file on your laptop, it takes a long, long time. It takes much longer than if it's a short, tiny file.

Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.

Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.

The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X.

Hadoop Video Tutorial from http://onlinetrainingcources.blogspot.in/

1. Hadoop Series Introduction

- The State of Data

- Hadoop

- Series Layout

2 Hadoop Technology Stack

- Hadoop Core

- Hadoop Projects

- Hadoop Incubator

- Stack Implementation

3 Hadoop Distributed File System HDFS

- HAFS Architecture

- HDFS Internals

- HDFS Interaction

4 Introduction to MapReduce

- MapReduce Architecture

- MapReduce Internals

- MapReduce Example

5 Installing Apache Hadoop Single Node

- Installation Overview

- Installing Hadoop

- Hadoop Daemons Stuff

6 Installing Apache Hadoop Multi Node

- Cluster Configurations

- Configuring Masters

- Configuring Slaves

- Cluster Stuff

7 Troubleshooting, Administering and Optimizing Hadoop

- Hadoop Troubleshooting

- Hadoop Administration

- Hadoop Optimization

8 Managing HDFS

- Data Data Data

- HDFS Interacton

- HDFS Management

- Upgrade Process

- Rack Ayareness

9 MapReduce Development

- Development overview

- Configuring IDE Projects

- Writing Testing Jobs

- Running Jobs Against Clusters

10 Introduction to Pig

- Pig Overview

- Pig vs SQL

- Pig Latin

- Installing PIG

11 Developing with Pig

- Loading storing

- Filter Transform

- Grouping Sorting

- Combining Splitting

- User Defined Function

- Debugging/Diagnostics

12 Introduction to Hive

- Hive Overview

- Hive QL-Overview

- Hive Installation

- Hive Example

13 Developing with Hive

- Creating Tables

- Loading Data

- Creating View

- Creating Indexes

14 Introduction to HBase

- HBase Overview

- HBase Architecture

- HBase Installation

- HBase Admin Test

15 Developing with HBase

- HBase Client Loading Overview

- Fully Distributed HBase Configuration

- Loading HBase

- HBase Data Access

16 Introduction to Zookeeper

- Zookeper Overview

- Zookeper Architecture

- Zookeper Installation

17 Introduction to Sqoop

- Sqoop Overview

- Sqoop Installation

- Importing Data

- Exporting Data

18 Local Hadoop Cloudera CDH VM

- Cloudera CDH Overview

- Getting Started with Cloudera CDH VM

- Cloudera CDH VM walkthrough

19 Cloud Hadoop Amazon EMR

- Amazon EMR Overview

- Loading S3

- Running EMR LOB Flows

20 Cloud Hadoop Microsoft HDInsight

- Microsof HDInsight Overview

- Provisioning An HDInsight Cluster

- Administering HDInsight Cluster Running Jobs

3 comments:

Unknown23 September 2015 at 09:57
The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training
Thank you for sharing Such a good tutorials on Hadoop
siddu18 November 2015 at 05:05
Nice article about Hadoop Tutorial
online software training
Brukkevin9 February 2021 at 22:35
This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information. Keep it up. Keep blogging. Looking to reading your next post. Hank Voight Jacket

Pages

Nuffnang

Monday, 10 August 2015

Hadoop Tutorial

3 comments:

Blog Archive

Labels