This system seems similar to Pushpin. Their visualizations are good.
Category Archives: Uncategorized
In my presentation I introduced some concepts of Developing Multitouch Table Applications. The slides contains a short description about the underlying hardware technologies. As the platform for the implementation of my prototype I decided to use Microsoft Surface SDK. Description of the it’s architecture contains also in my presentation.
Here are the slides from my presentation last week. I talked about (scientific) recommender systems in general, the three categories they can be classified into (content-based, collaborative, hybrid) and I showed some examples of algorithms and applications (TF-IDF, Apache Mahout, SciPlore) where such recommender systems are used.
The act of citing other authors work is the practice of most computer scientists. These practice helps us detects and explain the community effect. It aids in drawing conclusion to justify progress and gather information funding as well as to identify trends and patterns in evolving field overtime. This can be used as guidelines by donor or funding agencies and tenure communities in making more inform decisions.
Overview of what I learned by now
There is 4 parts:
Hadoop is basic framework of distributed systems.Hadoop deals with a large number of data in a reliable, efficient and scalable way. It’s reliable because it assumes that the computing and storagement of elements will fail,therefore it maintains multiple copies of data, ensures that the failed nodes can get re-distributed process; It’s efficient because it works in a parallel way, it speeds up through parallel processing. And hadoop is scalable, the user can develop a distributed application without knowing distributed low-level details very well.Take full advantage of the power of high-speed computing clusters and storage.
MapReduce is a programming model for parallel computing of a large of data processing. There are two concepts: Map and Reduce. The main idea of mapreduce is from functional programming language and vector programming language. It makes it so convinent for programmers who don’t know distributed parallel computing to run their programs in a distributed system.
Hadoop Distributed File System, HDFS is high fault-tolerent system.And designed to be deployed in low-cost hardware.And it provides high throughput toaccess the application data, for those with large data sets applications.HDFS is a important part of hadoop project.It was developed for the basic structure of open source apache projects.
HBase is a distriduted and column-oriented open source database.The technology comes from the Google paper “Bigtable: a structured distributed storage system” written by Chang et al.Just like Bigtable uses the distributed data storage provided by Google file system, HBase provides the ability which is similar with Bigtable for Hadoop. HBase is a sub-project of Apache Hadoop. HBase is not like the normal relational database, it’s a database for unstructured storage.Another difference is it’s column-based not line-based model.