Overview of what I learned by now
There is 4 parts:
Hadoop is basic framework of distributed systems.Hadoop deals with a large number of data in a reliable, efficient and scalable way. It’s reliable because it assumes that the computing and storagement of elements will fail,therefore it maintains multiple copies of data, ensures that the failed nodes can get re-distributed process; It’s efficient because it works in a parallel way, it speeds up through parallel processing. And hadoop is scalable, the user can develop a distributed application without knowing distributed low-level details very well.Take full advantage of the power of high-speed computing clusters and storage.
MapReduce is a programming model for parallel computing of a large of data processing. There are two concepts: Map and Reduce. The main idea of mapreduce is from functional programming language and vector programming language. It makes it so convinent for programmers who don’t know distributed parallel computing to run their programs in a distributed system.
Hadoop Distributed File System, HDFS is high fault-tolerent system.And designed to be deployed in low-cost hardware.And it provides high throughput toaccess the application data, for those with large data sets applications.HDFS is a important part of hadoop project.It was developed for the basic structure of open source apache projects.
HBase is a distriduted and column-oriented open source database.The technology comes from the Google paper “Bigtable: a structured distributed storage system” written by Chang et al.Just like Bigtable uses the distributed data storage provided by Google file system, HBase provides the ability which is similar with Bigtable for Hadoop. HBase is a sub-project of Apache Hadoop. HBase is not like the normal relational database, it’s a database for unstructured storage.Another difference is it’s column-based not line-based model.