Monthly Archives: December 2011

Scientific Recommender Systems Prototype

Today, I want to introduce my ideas for recommenders for my prototype to you. Basically, I want to create a simple web application with JSP which provides different recommenders and corresponding visualizations based on the given mysql dump. Following is a list of the recommenders that I am thinking of with a few thoughts on the implementation and sketches for possible visualizations.

1.) recommend papers based on citations as boolean preferences between papers (collaborative filtering)

implementation with mahout: Create a datamodel based on boolean preferences (as in an association exist or does not) and then run the recommender with different similarity metrics contained in mahout (that can work with boolean preferences), evaluate and compare them.

2.) recommend papers based on cocitation

implementation: If I understand the contents of the co_citation view correctly (count of the cocitations between two papers), this would simply be a maximum search with the ID of the input paper as one of the IDs in the view.

possible visualization:

(A denotes the recommendation, # the number of cocitations between the input paper and A)

3.) recommend papers based on bibliographic coupling

implementation: Again, if I understand the contents of the bib_coupling view correctly (count of the bibliographic couplings between two papers), this would simply be a maximum search with the ID of the input paper as one of the IDs in the view.

possible visualization:

(A denotes the recommendation, # the number of bibliographic couplings between the input paper and A)

4.) recommend papers based on common keywords

implementation with mahout: Create an item-based recommender and create an ItemSimilarity class which computes the similarity between two papers based on their shared keywords.

5.) recommend people based on co-authorship (collaborative filtering)

implementation with mahout: Co-authorship as preferences between authors (so people who have often written together have a high preference for each other), user-based recommender to find similar people

6.) recommend people based on event participation

implementation with mahout: Again co-authorship as preferences between authors, item-based recommender (create an ItemSimilarity class which computes the similarity between two authors based on their common event participations), recommendations should then be something like authors who often participated in the same events as the input author and/or his co-authors but never wrote a paper together with the input author


Posted by on 12.12.2011 in Java, Mahout, Recommender Systems, Seminar Phase


Tags: , ,

Linked Data: Evolving the Web into a Global Data Space

Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool.

This book gives an overview of the principles of Linked Data as well as the Web of Data that has emerged through the application of these principles. The book discusses patterns for publishing Linked Data, describes deployed Linked Data applications and examines their architecture.

The free HTML version of the book can be found here.


Tags: ,

Good estimating tool for agile team

Via this post i am just sharing my experience [so that it could be useful for the one doing the topic related agile], i don’t mean it is the best tool or so.

Planning poker is a nice way to make estimate in agile medium sized team. Planning poker combined with some group chatting application like Skype makes one of a best way to estimate for an agile team of medium size if they are distributed in different locations.

Following briefly described how a planning poker can be used

Moderator [ for example:  project manager/team leader] host the game and invites other. Other team member join the game. Host then writes a task for all to estimate. Each person gives his own independent estimate. If  any person gives quite a disagreeing estimate compared to majority then he/she has to give reason why that was his/her estimate [this can be done via skype ]. If the reason is convincing then host restarts the estimating game for that task. When all members gives similar estimate then moderator decides the estimate for that task. In this way all the task are estimated. Then sum of all estimates of task is now used to generate the project or module’s development estimation time.


Posted by on 08.12.2011 in agile, xp


Tags: , ,

Time Series Visualization with Cube

Today I stumbled upon Cube, an Open Source system for visualizing time series data. The system is based on Node.js, MongoDB and D3.js. The developers of the “half-baked but still tasty” tool describe Cube as:

an open-source system for visualizing time series data, built on MongoDB, Node and D3. If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards. Cube speaks WebSockets for low-latency, asynchronous input and output: new events are streamed in, and requested metrics are streamed out as they are computed. (You can also POST events to Cube, if that’s your thing, and collectd integration is included!) Metrics are cached in capped collections, and simple reductions such as sum and max use pyramidal aggregation to improve performance. Visualizations are generated client-side and assembled into dashboards with a few mouse clicks.

They also share a video on building an analytical dashboard in 60 seconds (the video is actually only 31 seconds long though), which shows the capabilities and speed of Cube.

I think it could be a strong candidate for future implementation in the project group. What do you think?


Tags: , , , , ,


Social Network Metrics