As additional support, we will also consult the other two papers, namely,
Two Purposes for Matrix Factorization: A Historical Appraisal*
Lawrence Hubert, Jacqueline Meulman, Willem Heiser, SIAM Review, 42(1):68-82, March 2000
Automatic Nonzero Structure Analysis
Aart J. C. Bik and Harry A. G. Wijshoff, SIAM Journal on Computing, 28(5):1576-1587, 1999.
Currently, there exist many disadvantages in retrieving information from large dynamic databases. Referring to Berry's paper, we will explore the automated Information Retrieval(IR) using vector space model to manage and index large text collections. In this model, each document is encoded as a vector and each vector component reflects the importance of a particular term in representing the semantics or meaning of that document. Then the vectors for all documents in a database are stored as the columns of a single matrix. A user's query of the database will be represented as a vector of numbers. Relevant documents in the database are then identified via simple vector operations.
An important concept in this paper, matrix factorization, was introduced in CS20. As QR factorization was discussed in detail in the class, Singular Value Decomposition(SVD), a more complicated extension of QR factorization, is entirely new to us. It is quite challenging to implement both algorithms so that they can handle relatively large matrices. Moreover, we will also familiarize ourselves with modern information retrieval techniques and try to address a practical implementation of this IR method so that it can manage real-world dynamic database systems.
We plan to use Matlab for the linear algebra intensive algorithms, and Perl or Tcl for web-based, real-world database applications. Matlab, Perl are new to both of us, so we will try our best to manage these tools.
Page created on 04-25-2000