Mathematical Document Retrieval System Based on Signature Hashing

Sourish Dhar
Sudipta Roy


Scientific documents and magazines involve large number of mathematical expressions and formulas alongwith text. The continuous growth of such documents necessitates the requirement of developing specialized tools andtechniques, which could handle and analyse mathematical expressions and formulas. Mathematical expressions andformulae are highly structured and quite different from traditional text. Due to which conventional text retrievalsystem performs poorly in retrieving scientific documents based on mathematical expression formulated as a query.Mathematical information retrieval is concerned with finding information in documents that include mathematics. Toaddress the challenges posed by mathematical formulae as compared to text, this paper aims to construct a mathaware search engine, which can retrieve relevant scientific documents based on a mathematical query. A novelsignature based hashing scheme to index raw mathematical web documents is proposed in this paper, which can alsotake mathematical notational equivalences into account. The proposed system demonstrates better precision andstability of the ranked results when compared with other related state-of-the-art math aware search engines.

