DFS and HDFS
Distributed File System (DFS) works based on master and slave concepts.Files are being distributed across the slave nodes available in the DFS. However it does not support fault tolerance hence unable to retrieve the data from slave nodes if it goes down due to any network issue. DFS uses highly configured nodes to form its cluster henec its cost effective which is chalangable for small scale industries to store their data
Hadoop Distributed File System (HDFS) also works based on master and slave mechanism.Files are being distributed across the slave nodes . It has the replica of all the slave node in its cluster hence we can retrieve the data from its cluster even though the slave nodes go down. It can be scalable from one node to multiple number of nodes easily. HDFS cluster can be formed using commodity hardware which makes user to store their data at low cost
Big Data/Hadoop Learning
Forum where you can get to know the hadoop concepts & tools...
Hadoop Frame Work
Doug Cutting & Mike Cafarella were started research on the solution given by Google (Map Reduce Algorithm) and they named it as HADOOP in 2005. Hadoop runs application based on map reduce algorithm in which data is processed parallely on different machines/ nodes. It used to perform statistical analysis on huge volume of data.
Hadoop is an Apache open source frame work developed by Java. It distributes the data across the nodes available in the cluster then process it. Hadoop can be scale up from single machine to multiple machines and each machine is cabable for storage and processing
Hadoop modules are given below
Hadoop Common : These are java libraries and utilities required for other modules
Hadoop YARN : This is a frame work for job scheduling and cluster resource management activity
Hadoop Distributed File System : This is purely meant for storage. HDFS uses store the data which will go for processing
Hadoop MapReduce : This is purely meant for processing. MapReduce algorithm will be applied on the data stored on HDFS
Traditional approach to store & process the data...
In the traditional approach , every organisation will have a computer to store and process their own data. Data will be stored in RDBMS such as Oracle, SQL server,MySQL, MariaDB and SQLite. An application/ Code can be written to interact with DB and process the desired data from RDBMS system.
This approach works well when the volume of data accomodate the RDBMS or suite for the processor which is able to process the data. But this traditional approach would not suitable to handle huge volume of data as the RDBMS & processor make the large volume data processing as tedious task
Google solved this issue using an algorithm called Map Reduce. This algorithm splits the original task into small number of tasks and assign those tasks to number of computer connected in a network then the processing happens on all the computer. Finally, collect the result set from every individual computer to form the final result dataset
16/03/2017
The data consumes space in storage system can be measured by bytes..Data consumes space more than one Peta byte referred as Big Data.
Find the hierarchy of bytes below
Different types of data in Hadoop environment are listed below
Sturctured Data : Relational/Tables data
Semi Structured Data : XML data
Un Structured Data : PDF, word/Text document, Application logs
What are 3Vs in big data?
Volume, Variety and Velocity are 3 Vs in big data
Data Volume refers to the amount of data needs to be processed, Data Variety refers different types of data such as Sturctured, Semi-Structured & Unstructured. Data Velocity refers to the speed/rate of data processing.
These 3 properties should be handled effectively to manage big data
Big data means the data which are huge in its volume.Data in Peta bytes can be called as "big data". Big data is a collection of huge volume of data which can't be processed/analysed using traditional computing techniques. Sources of big data listed are below..
Social media, e- Commerce websites, Telecommunications organisation, share market data, Government sensex portal and etc..
Click here to claim your Sponsored Listing.
Location
Category
Website
Address
Chennai
600091