Big Data/Hadoop Learning

Big Data/Hadoop Learning

Share

Forum where you can get to know the hadoop concepts & tools...

30/03/2017

DFS and HDFS

Distributed File System (DFS) works based on master and slave concepts.Files are being distributed across the slave nodes available in the DFS. However it does not support fault tolerance hence unable to retrieve the data from slave nodes if it goes down due to any network issue. DFS uses highly configured nodes to form its cluster henec its cost effective which is chalangable for small scale industries to store their data

Hadoop Distributed File System (HDFS) also works based on master and slave mechanism.Files are being distributed across the slave nodes . It has the replica of all the slave node in its cluster hence we can retrieve the data from its cluster even though the slave nodes go down. It can be scalable from one node to multiple number of nodes easily. HDFS cluster can be formed using commodity hardware which makes user to store their data at low cost

30/03/2017

Hadoop Frame Work

Doug Cutting & Mike Cafarella were started research on the solution given by Google (Map Reduce Algorithm) and they named it as HADOOP in 2005. Hadoop runs application based on map reduce algorithm in which data is processed parallely on different machines/ nodes. It used to perform statistical analysis on huge volume of data.

Hadoop is an Apache open source frame work developed by Java. It distributes the data across the nodes available in the cluster then process it. Hadoop can be scale up from single machine to multiple machines and each machine is cabable for storage and processing

Hadoop modules are given below

Hadoop Common : These are java libraries and utilities required for other modules

Hadoop YARN : This is a frame work for job scheduling and cluster resource management activity

Hadoop Distributed File System : This is purely meant for storage. HDFS uses store the data which will go for processing

Hadoop MapReduce : This is purely meant for processing. MapReduce algorithm will be applied on the data stored on HDFS

21/03/2017

Traditional approach to store & process the data...

In the traditional approach , every organisation will have a computer to store and process their own data. Data will be stored in RDBMS such as Oracle, SQL server,MySQL, MariaDB and SQLite. An application/ Code can be written to interact with DB and process the desired data from RDBMS system.

This approach works well when the volume of data accomodate the RDBMS or suite for the processor which is able to process the data. But this traditional approach would not suitable to handle huge volume of data as the RDBMS & processor make the large volume data processing as tedious task



Google solved this issue using an algorithm called Map Reduce. This algorithm splits the original task into small number of tasks and assign those tasks to number of computer connected in a network then the processing happens on all the computer. Finally, collect the result set from every individual computer to form the final result dataset

Photos 16/03/2017

The data consumes space in storage system can be measured by bytes..Data consumes space more than one Peta byte referred as Big Data.

Find the hierarchy of bytes below

16/03/2017

Different types of data in Hadoop environment are listed below

Sturctured Data : Relational/Tables data

Semi Structured Data : XML data

Un Structured Data : PDF, word/Text document, Application logs

16/03/2017

What are 3Vs in big data?

Volume, Variety and Velocity are 3 Vs in big data

Data Volume refers to the amount of data needs to be processed, Data Variety refers different types of data such as Sturctured, Semi-Structured & Unstructured. Data Velocity refers to the speed/rate of data processing.

These 3 properties should be handled effectively to manage big data

16/03/2017

Big data means the data which are huge in its volume.Data in Peta bytes can be called as "big data". Big data is a collection of huge volume of data which can't be processed/analysed using traditional computing techniques. Sources of big data listed are below..

Social media, e- Commerce websites, Telecommunications organisation, share market data, Government sensex portal and etc..

Want your school to be the top-listed School/college in Chennai?

Click here to claim your Sponsored Listing.

Location

Category

Website

Address


Chennai
600091