Currently available tools to extract metadata from research papers are:
FreeCite is an open-source application that parses document citations into fielded data. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields.
As an increasing number of authors are publishing articles online, there is a very real need for a simple way to build links between an online document and the documents which it cites. ParaCite was created in parallel with the EPrints.org software as a possible solution to this problem, and has since grown into a usable yet powerful system for both reference parsing and location.
This paper describes a simple method for extracting metadata elds from citations using hidden Markov models. The method is easy to implement and can achieve levels of preci-sion and recall for heterogeneous citations comparable to or greater than other HMM-based methods. The method consists largely of string manipulation and otherwise depends only on an implementation of the Viterbi algorithm, which is widely available, and so can be implemented by diverse digital library systems.