Full-Text Search Technology Document Archive

The following is a mixture of pdf documents related to fulltext web search,
inverted indexes, pagerank, query weighting, geo search, lexicon building ,etc.

Note: we may want to move these to dev03 for indexing and we definitely may
want to add some better titling (06/23/03 pw)

  • 03 web search readthis   ( 181K)
  • adr good   ( 98K)
  • algo phonetic sequences   ( 90K)
  • altavista sdk   ( 459K)
  • auth hubs sources.pfd   ( 310K)
  • auth sources   ( 262K)
  • AutomaticLexMaintenance   ( 35K)
  • berkeleydb overview   ( 77K)
  • bind9arm   ( 660K)
  • biocrawl   ( 153K)
  • breadth first crawl   ( 114K)
  • brown94fast   ( 118K)
  • buildingfulltext   ( 2.29M)
  • buying guide search   ( 240K)
  • clean w 05   ( 857K)
  • cluster methods   ( 136K)
  • complementing search engines data   ( 603K)
  • computing pagerank   ( 568K)
  • conjunctive queries   ( 471K)
  • constructing similarity via rogets   ( 1.78M)
  • Contextual Network Graphs.pdf   ( 239K)
  • corpus big not better   ( 183K)
  • crawler distributed   ( 329K)
  • CS TR 4291 NICE   ( 312K)
  • dewey intro   ( 194K)
  • dewey new features   ( 135K)
  • dns   ( 408K)
  • doc rais en   ( 205K)
  • doc rank en   ( 234K)
  • doc rank   ( 238K)
  • efficient pagerank   ( 568K)
  • est local pagerank.pfd   ( 222K)
  • eval distr arch performance   ( 254K)
  • ExtMemory massive   ( 828K)
  • fastdocclustering   ( 201K)
  • findingrelatedpages   ( 315K)
  • focus1   ( 306K)
  • focus2   ( 126K)
  • focus3   ( 272K)
  • focusreview   ( 210K)
  • geo   ( 121K)
  • geosearch   ( 121K)
  • google1   ( 123K)
  • google class11   ( 112K)
  • google crawling order   ( 107K)
  • google datamining   ( 1.58M)
  • google find similar   ( 153K)
  • google iceberg   ( 367K)
  • google pagerank2   ( 306K)
  • google pagerank   ( 290K)
  • google pattern extraction   ( 227K)
  • google   ( 123K)
  • gridpointloc   ( 220K)
  • highperfwebcrawling   ( 139K)
  • hits sw   ( 824K)
  • indexing workshop   ( 305K)
  • inktomi   ( 233K)
  • intervoicecmplt   ( 6.64M)
  • inverted files good ranking algor   ( 389K)
  • inverted large parallel indexing   ( 1.07M)
  • invertedperf   ( 278K)
  • inverted vs signature   ( 243K)
  • ixe architecture   ( 145K)
  • ixe docrank   ( 234K)
  • IXE EN   ( 67K)
  • IXEsheet eng   ( 74K)
  • Kleinfeldt   ( 188K)
  • LargeCollsSmTexts   ( 583K)
  • LaTaT language and text analysis   ( 107K)
  • lec2long 2   ( 68K)
  • lexicon squeeze   ( 93K)
  • linktext queryresolution   ( 85K)
  • linux1 redhat1   ( 180K)
  • li   ( 33K)
  • load balancing cluster   ( 127K)
  • lucene performance   ( 99K)
  • make inverted   ( 857K)
  • medical query precision   ( 43K)
  • mercator   ( 112K)
  • metacrawler1   ( 166K)
  • meta scalable   ( 248K)
  • metasearch textimage   ( 798K)
  • mining the web   ( 1.49M)
  • muramatsu01transparent   ( 300K)
  • news metasearch   ( 221K)
  • news search   ( 534K)
  • Nodalida03final   ( 36K)
  • NorthernLight ESEWhitePaper    ( 282K)
  • northernlight techpaper   ( 237K)
  • odissea   ( 110K)
  • parallel integer sorting   ( 449K)
  • pertimm   ( 3.84M)
  • phrase based clustering se results   ( 809K)
  • phrase querying   ( 145K)
  • profile based search   ( 946K)
  • query exec tuning   ( 367K)
  • query refinement model   (22.10M)
  • rank aggregation   ( 147K)
  • rank co occur   ( 359K)
  • ranking nist   ( 102K)
  • ranking titlecat   ( 67K)
  • resultcaching   ( 270K)
  • reuters corpus classification   ( 66K)
  • reuters corpus info   ( 251K)
  • santy spellchecker   ( 166K)
  • savvysearch   ( 243K)
  • scriptie   ( 648K)
  • search2   ( 177K)
  • search and wordnet   ( 249K)
  • search engine spam detection   ( 889K)
  • Search Engine Technology   ( 1.75M)
  • searching large lexicons   ( 1.27M)
  • SEdesign2   ( 235K)
  • SEdesign   ( 1.75M)
  • semi automated discovery   ( 793K)
  • shoco2002   ( 1.45M)
  • spellcheck includes perl source   ( 648K)
  • SpellCheckInSearchEngine   ( 25K)
  • SpellingIR   ( 46K)
  • Spider iscit2002   ( 249K)
  • SRC 1998 014   ( 68K)
  • stem good   ( 296K)
  • systemimager manual   ( 467K)
  • terrascale io solutions   ( 516K)
  • TextsumEval   ( 37K)
  • thesis specialized search   ( 568K)
  • topic pagerank   ( 195K)
  • trec   ( 771K)
  • Ultra5.4 Admin   ( 3.33M)
  • urlcrawlingorder   ( 118K)
  • V7DeployLic   ( 89K)
  • V7DevLic   ( 63K)
  • V7n6SrvrLic   ( 59K)


  • Allen Hayden Harvard
    Allen Hayden Senior Database Consultant