вот определения lsi -
1) We now discuss the approximation of a term-document matrix $\lsimatrix$ by one of lower rank using the SVD. The low-rank approximation to $\lsimatrix$ yields a new representation for each document in the collection. We will cast queries into this low-rank representation as well, enabling us to compute query-document similarity scores in this low-rank representation. This process is known as latent semantic indexing (generally abbreviated LSI). источник
nlp.stanford.edu
2) Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis).
wiki