词条 | 现代信息检索(英文第2版) |
释义 | 基本信息原书名: Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition) 原出版社: Addison-Wesley Professional 作者: (西班牙)Ricardo Baeza-Yates (巴西)Berthier Ribeiro-Neto 丛书名: 经典原版书库 出版社:机械工业出版社 ISBN:9787111331742 上架时间:2011-3-7 出版日期:2011 年3月 开本:32开 页码:913 内容简介《现代信息检索(英文版.第2版)》详细介绍了信息检索的所有主要概念和技术,以及有关信息检索方面的所有新变化,使读者既可以对现代信息检索有一个全面的了解,又可以获取现代信息检索所有关键主题的详细知识。《现代信息检索(英文版.第2版)》的主要内容由信息检索领域的代表人物baeza-yates和ribeiro-neto编著;对于那些希望深入研究关键领域的读者,《现代信息检索(英文版.第2版)》中还提供了由其他主要研究人员编写的关于特殊主题的发展现状。 与上一版相比,《现代信息检索(英文版.第2版)》在内容和结构上都有大量调整、更新和充实,其中新增内容在60%到70%左右。具体更新情况如下: ·新增了文本分类、网络信息爬取、结构化文本检索和企业搜索等章节,以及关于开源搜索的一个附录。 ·全面改写了用户界面、多媒体检索和数字图书馆等内容。 ·拓展了一些章节,介绍了信息检索方面的新的重要进展,如语言模型、新的评价方法、查询的特点、基于聚类和分布式信息检索等。 目录1 introduction 1 1.1 information retrieval 1 1.1.1 early developments 1 1.1.2 information retrieval in libraries and digital libraries 3 1.1.3 ir at the center of the stage 3 1.2 the ir problem 3 1.2.1 the user’s task 4 1.2.2 information versus data retrieval 5 1.3 the ir system 5 1.3.1 software architecture of the ir system 5 1.3.2 the retrieval and ranking processes 7 1.4 theweb 8 1.4.1 a brief history 8 1.4.2 the e-publishing era 9 1.4.3 how the web changed search 10 1.4.4 practical issues on the web 12 1.5 organization of the book 12 1.5.1 focus of the book 12 1.5.2 book contents 13 1.6 the book web site: a teaching resource 16 .1.7 bibliographic discussion 17 2 user interfaces for search 21 by marti hearst 2.1 introduction 21 2.2 how people search 21 preface to the second edition v preface to the first edition vii authors’ acknowledgements to the second edition viii authors’ acknowledgements to the first edition x publishers’ acknowledgements xii contents xvii 2.2.1 information lookup versus exploratory search 22 2.2.2 classic versus dynamic model of information seeking 23 2.2.3 navigation versus search 24 2.2.4 observations of the search process 24 2.3 search interfaces today 25 2.3.1 getting started 25 2.3.2 query specification 26 2.3.3 query specification interfaces 27 2.3.4 retrieval results display 29 2.3.5 query reformulation 32 2.3.6 organizing search results 35 2.4 visualization in search interfaces 40 2.4.1 visualizing boolean syntax 42 2.4.2 visualizing query terms within retrieval results 43 2.4.3 visualizing relationships among words and documents 47 2.4.4 visualization for text mining 49 2.5 design and evaluation of search interfaces 50 2.6 trends and research issues 54 2.7 bibliographic discussion 54 3 modeling 57 3.1 ir models 57 3.1.1 modeling and ranking 57 3.1.2 characterization of an ir model 58 3.1.3 a taxonomy of ir models 59 3.2 classic information retrieval 61 3.2.1 basic concepts 61 3.2.2 the boolean model 64 3.2.3 term weighting 66 3.2.4 tf-idf weights 68 3.2.5 document length normalization 75 3.2.6 the vector model 77 3.2.7 the probabilistic model 79 3.2.8 brief comparison of classic models 86 3.3 alternative set theoretic models 87 3.3.1 set-based model 87 3.3.2 extended boolean model 92 3.3.3 fuzzy set model 95 3.4 alternative algebraic models 98 3.4.1 generalized vector space model 98 3.4.2 latent semantic indexing model 101 3.4.3 neural network model 102 3.5 alternative probabilistic models 104 3.5.1 bm25 104 3.5.2 language models 107 3.5.3 divergence from randomness 113 3.5.4 bayesian network models 116 3.6 other models 124 3.6.1 the hypertext model 124 3.6.2 web based models 125 3.6.3 structured text retrieval 126 3.6.4 multimedia retrieval 126 3.6.5 enterprise and vertical search 126 3.7 trends and research issues 127 3.8 bibliographic discussion 128 4 retrieval evaluation 131 4.1 introduction 131 4.2 the cranfield paradigm 132 4.2.1 a brief history 132 4.2.2 reference collections 134 4.3 retrieval metrics 134 4.3.1 precision and recall 135 4.3.2 single value summaries: p@n, map, mrr, f 139 4.3.3 user-oriented measures 144 4.3.4 dcg: discounted cumulated gain 145 4.3.5 bpref: binary preferences 150 4.3.6 rank correlation metrics 153 4.4 reference collections 158 4.4.1 the trec collections 159 4.4.2 other reference collections 166 4.4.3 other small test collections 167 4.5 user-based evaluation 168 4.5.1 human experimentation in the lab 168 4.5.2 side-by-side panels 168 4.5.3 a/b testing 169 4.5.4 crowdsourcing 170 4.5.5 evaluation using clickthrough data 171 4.6 practical caveats 173 4.7 trends and research issues 174 4.8 bibliographic discussion 174 5 relevance feedback and query expansion 177 5.1 introduction 177 5.2 a framework for feedback methods 178 5.3 explicit relevance feedback 180 5.3.1 relevance feedback for the vector model: rocchio method 181 5.3.2 relevance feedback for the probabilistic model 183 5.3.3 evaluation of relevance feedback 184 5.4 explicit feedback through clicks 185 5.4.1 eye tracking and relevance judgements 185 5.4.2 user behavior 186 5.4.3 clicks as a metric of user preferences 187 5.5 implicit feedback through local analysis 190 5.5.1 implicit feedback through local clustering 190 5.5.2 implicit feedback through local context analysis 193 xviii contents 5.6 implicit feedback through global analysis 195 5.6.1 query expansion based on a similarity thesaurus 195 5.6.2 query expansion based on a statistical thesaurus 198 5.7 trends and research issues 200 5.8 bibliographic discussion 200 6 documents: languages & properties 203 with gonzalo navarro and nivio ziviani 6.1 introduction 203 6.2 metadata 205 6.3 document formats 206 6.3.1 text 206 6.3.2 multimedia 207 6.3.3 graphics and virtual reality 208 6.4 markup languages 208 6.4.1 sgml 209 6.4.2 html 211 6.4.3 xml 214 6.4.4 rdf: resource description framework 216 6.4.5 hytime 217 6.5 text properties 218 6.5.1 information theory 218 6.5.2 modeling natural language 219 6.5.3 text similarity 222 6.6 document preprocessing 223 6.6.1 lexical analysis of the text 224 6.6.2 elimination of stopwords 226 6.6.3 stemming 226 6.6.4 keyword selection 227 6.6.5 thesauri 228 6.7 organizing documents 231 6.7.1 taxonomies 231 6.7.2 folksonomies 232 6.8 text compression 233 6.8.1 basic concepts 234 6.8.2 statistical methods 234 6.8.3 statistical methods: modeling 235 6.8.4 statistical methods: coding 238 6.8.5 dictionary methods 245 6.8.6 preprocessing for compression 246 6.8.7 comparing text compression techniques 248 6.8.8 structured text compression 249 6.9 trends and research issues 250 6.10 bibliographical discussion 253 7 queries: languages & properties 255 with gonzalo navarro 7.1 query languages 255 contents xix 7.1.1 keyword-based querying 256 7.1.2 beyond keywords 259 7.1.3 structural queries 262 7.1.4 query protocols 265 7.2 query properties 267 7.2.1 characterizing web queries 267 7.2.2 user search behavior 269 7.2.3 query intent 270 7.2.4 query topic 272 7.2.5 query sessions and missions 273 7.2.6 query difficulty 274 7.3 trends and research issues 278 7.4 bibliographical discussion 279 8 text classification 281 with marcos gon?calves 8.1 introduction 281 8.2 a characterization of text classification 282 8.2.1 machine learning 282 8.2.2 the text classification problem 283 8.2.3 text classification algorithms 284 8.3 unsupervised algorithms 286 8.3.1 clustering 286 8.3.2 naive text classification 290 8.4 supervised algorithms 291 8.4.1 decision trees 294 8.4.2 the k-nn classifier 299 8.4.3 the rocchio classifier 300 8.4.4 probabilistic naive bayes document classification 303 8.4.5 the svm classifier 306 8.4.6 ensemble classifiers 316 8.4.7 final remarks on supervised algorithms 319 8.5 feature selection or dimensionality reduction 320 8.5.1 term–class incidence table 321 8.5.2 term document frequency 322 8.5.3 tf-idf weights 322 8.5.4 mutual information 323 8.5.5 information gain 323 8.5.6 chi square 324 8.5.7 impact of feature selection 325 8.6 evaluation metrics 325 8.6.1 contingency table 325 8.6.2 accuracy and error 326 8.6.3 precision and recall 327 8.6.4 f-measure and f1 327 8.6.5 cross-validation 329 8.6.6 standard collections 329 8.7 organizing the classes – building taxonomies 330 xx contents 8.8 trends and research issues 333 8.9 bibliographic discussion 334 9 indexing and searching 337 with gonzalo navarro 9.1 introduction 337 9.2 inverted indexes 340 9.2.1 basic concepts 340 9.2.2 full inverted indexes 341 9.2.3 searching 345 9.2.4 ranking 348 9.2.5 construction 351 9.2.6 compressed inverted indexes 354 9.2.7 structural queries 357 9.3 signature files 357 9.4 suffix trees and suffix arrays 360 9.4.1 structure: tries and suffix trees 361 9.4.2 searching for simple strings 362 9.4.3 searching for complex patterns 363 9.4.4 construction 365 9.4.5 compressed suffix arrays 367 9.5 sequential searching 372 9.5.1 simple strings: horspool 373 9.5.2 complex patterns: automata and bit-parallelism 375 9.5.3 faster bit-parallel algorithms 379 9.5.4 regular expressions 382 9.5.5 multiple patterns 384 9.5.6 approximate searching 385 9.5.7 searching compressed text 389 9.6 multi-dimensional indexing 391 9.7 trends and research issues 393 9.8 bibliographic discussion 394 10 parallel and distributed ir 399 with eric brown 10.1 introduction 399 10.2 a taxonomy of distributed ir systems 402 10.3 data partitioning 404 10.3.1 collection partitioning 405 10.3.2 collection selection 407 10.3.3 inverted index partitioning 409 10.3.4 partitioning other indexes 413 10.4 parallel ir 414 10.4.1 introduction 414 10.4.2 parallel ir on mimd architectures 416 10.4.3 parallel ir on simd architectures 418 10.5 cluster-based ir 423 10.6 distributed ir 424 contents xxi 10.6.1 introduction 424 10.6.2 indexing 428 10.6.3 query processing 431 10.6.4 web issues 437 10.7 federated search 438 10.8 retrieval in peer-to-peer networks 440 10.9 trends and research issues 444 10.10bibliographic discussion 445 11 web retrieval 447 with yoelle maarek 11.1 introduction 447 11.2 a challenging problem 449 11.3 the web 451 11.3.1 characteristics 451 11.3.2 structure of the web graph 452 11.3.3 modeling the web 454 11.3.4 link analysis 456 11.4 search engine architectures 458 11.4.1 basic architecture 458 11.4.2 cluster-based architecture 459 11.4.3 caching 462 11.4.4 multiple indexes 464 11.4.5 distributed architectures 466 11.5 search engine ranking 468 11.5.1 ranking signals 469 11.5.2 link-based ranking 470 11.5.3 simple ranking functions 473 11.5.4 learning to rank 473 11.5.5 learning the ranking function 474 11.5.6 quality evaluation 475 11.5.7 web spam 476 11.6 managing web data 477 11.6.1 assigning identifiers to documents 477 11.6.2 metadata 478 11.6.3 compressing the web graph 478 11.6.4 handling duplicated data 479 11.7 search engine user interaction 480 11.7.1 the search rectangle paradigm 481 11.7.2 the search engine result page 488 11.7.3 educating the user 497 11.8 browsing 498 11.8.1 flat browsing 499 11.8.2 structure guided browsing and web directories 499 11.9 beyond browsing 501 11.9.1 hypertext and the web 501 11.9.2 combining searching with browsing 501 11.9.3 web query languages 503 xxii contents 11.9.4 dynamic search 503 11.10related problems 504 11.10.1 computational advertising 504 11.10.2web mining 506 11.10.3 metasearch 508 11.11trends and research issues 509 11.11.1 beyond static text data 509 11.11.2 current challenges 511 11.12bibliographical discussion 513 12 web crawling 515 with carlos castillo 12.1 introduction 515 12.2 applications of a web crawler 517 12.2.1 general web search 517 12.2.2 topical crawling 518 12.2.3 web characterization 518 12.2.4 mirroring 518 12.2.5 web site analysis 519 12.3 a taxonomy of crawlers 519 12.3.1 types of web pages 520 12.4 architecture and implementation 521 12.4.1 crawler architecture 521 12.4.2 practical issues 523 12.4.3 parallel crawling 526 12.5 scheduling algorithms 527 12.5.1 selection policy 528 12.5.2 revisit policy 530 12.5.3 politeness policy 535 12.5.4 combining policies 538 12.6 evaluation 539 12.6.1 evaluating network usage 539 12.6.2 evaluating long-term scheduling 540 12.7 trends and research issues 541 12.7.1 crawling the “hidden” web 541 12.7.2 crawling with the help of web sites 542 12.7.3 distributed crawling 543 12.8 bibliographic discussion 543 13 structured text retrieval 545 with mounia lalmas 13.1 introduction 545 13.2 structuring power 546 13.2.1 explicit vs. implicit structure 546 13.2.2 static vs. dynamic structure 547 13.2.3 single hierarchy vs. multiple hierarchies 548 13.3 early text retrieval models 549 13.3.1 model based on non-overlapping lists 549 contents xxiii 13.3.2 model based on proximal nodes 550 13.3.3 ranking structured text results 551 13.4 xml retrieval 551 13.4.1 challenges in xml retrieval 551 13.4.2 indexing strategies 553 13.4.3 ranking strategies 554 13.4.4 removing overlaps 565 13.5 xml retrieval evaluation 566 13.5.1 document collections 566 13.5.2 topics 567 13.5.3 retrieval tasks 568 13.5.4 relevance 569 13.5.5 measures 571 13.6 query languages 573 13.6.1 characteristics 574 13.6.2 classification of xml query languages 575 13.6.3 examples of xml query languages 577 13.7 trends and research issues 582 13.8 bibliographic discussion 585 14 multimedia information retrieval 587 by dulce poncele′on and malcolm slaney 14.1 introduction 587 14.1.1 what is multimedia? 587 14.1.2 multimedia ir 588 14.1.3 text ir versus multimedia ir 589 14.2 the challenges 589 14.2.1 the semantic gap 589 14.2.2 feature ambiguity 591 14.2.3 machine-generated data 591 14.3 content-based image retrieval 592 14.3.1 color-based retrieval 593 14.3.2 texture 593 14.3.3 salient points 596 14.4 audio and music retrieval 597 14.4.1 fingerprinting 598 14.4.2 speech recognition 599 14.4.3 speaker identification 601 14.4.4 spoken document retrieval 602 14.4.5 audio basics 602 14.5 retrieving and browsing video 606 14.5.1 video abstracts 606 14.5.2 static summaries 607 14.5.3 mosaics and salient stills 608 14.5.4 dynamic summaries 609 14.5.5 interactive summaries 611 14.5.6 visual vs. audio browsing 612 14.5.7 evaluating summaries 613 xxiv contents 14.6 fusion models: combining it all 614 14.6.1 naming faces 614 14.6.2 naming images 615 14.6.3 naming audio 616 14.6.4 combining audio and video for avsr 617 14.6.5 combining audio and video for multimedia 620 14.7 segmentation 620 14.7.1 a video segmentation example 620 14.7.2 segmentation schemes for video 622 14.7.3 video segmentation with edges 623 14.7.4 speech segmentation 624 14.7.5 segmentation evaluation 625 14.8 compression and mpeg standards 625 14.8.1 intensity and sampling 626 14.8.2 color 626 14.8.3 lossy compression 628 14.8.4 lossless compression 628 14.8.5 temporal redundancy 630 14.8.6 motion prediction 631 14.8.7 mpeg standards 633 14.9 trends and research issues 636 14.10bibliographic discussion 637 15 enterprise search 641 by david hawking 15.1 introduction 641 15.1.1 characteristics and applications of enterprise search 642 15.1.2 enterprise search software 643 15.1.3 workplace search 644 15.2 enterprise search tasks 644 15.2.1 examples of search-supported tasks 644 15.2.2 search types 647 15.2.3 studying enterprise search 647 15.3 architecture of enterprise search systems 648 15.3.1 gathering 648 15.3.2 extracting 651 15.3.3 indexing 652 15.3.4 indexing textual annotations 653 15.3.5 query processing 654 15.3.6 presentation of search results 655 15.3.7 security models 657 15.3.8 federation/metasearch 659 15.4 enterprise search evaluation 662 15.4.1 published test collections for enterprise search 662 15.4.2 internal enterprise search evaluations 663 15.4.3 enterprise search tuning 665 15.4.4 what is it reasonable to expect? 666 15.5 potential reasons for dissatisfaction 667 contents xxv 15.6 context and personalization 668 15.6.1 controls and levers for contextualization 671 15.6.2 contextualization: local, enterprise or global? 675 15.6.3 privacy of profiles 676 15.6.4 defining, creating and maintaining a profile 677 15.6.5 user modeling 677 15.6.6 implicit measures 679 15.6.7 information filtering 679 15.6.8 social recommender systems 680 15.7 trends and research issues 681 15.8 bibliographic discussion 681 16 library systems 685 by edie rasmussen 16.1 the information environment in the library 685 16.2 online public access catalogues 687 16.2.1 opacs and bibliographic records 689 16.2.2 information retrieval from the ils 691 16.2.3 integrating the hybrid library 693 16.2.4 opacs and end users 694 16.2.5 ils: vendors and products 695 16.3 ir systems and document databases 697 16.3.1 bibliographic and full-text databases 698 16.3.2 content of database records 698 16.3.3 the online industry: database vendors 701 16.3.4 information retrieval from document databases 702 16.4 information retrieval in organizations 706 16.5 trends and research issues 708 16.6 bibliographic discussion 709 17 digital libraries 711 by marcos gon?calves 17.1 introduction 711 17.2 defining digital libraries 712 17.3 a general architecture 713 17.4 fundamentals 714 17.4.1 digital objects and collections 714 17.4.2 metadata and catalogs 716 17.4.3 repositories/archives 719 17.4.4 services 723 17.5 social-economical issues 725 17.5.1 social issues 725 17.5.2 economical issues 726 17.6 software systems 727 17.6.1 greenstone 728 17.6.2 eprints 728 17.6.3 dspace 728 17.6.4 fedora 729 xxvi contents 17.6.5 open digital libraries 729 17.6.6 the 5s suite 730 17.7 dl case studies 731 17.7.1 the networked dl of theses and dissertations 731 17.7.2 the national science digital library 732 17.7.3 the etana-dl archaeological digital library 732 17.8 trends and research issues 733 17.8.1 evaluation 733 17.8.2 integration 733 17.8.3 other research challenges 734 17.9 bibliographic discussion 735 a open source search engines 737 with christian middleton a.1 introduction 737 a.2 search engines 738 a.2.1 preliminary selection of search engines 738 a.2.2 features 741 a.2.3 evaluation 742 a.3 methodology 743 a.3.1 document collections 743 a.3.2 evaluation tests 744 a.3.3 experimental setup 744 a.4 experimental results 745 a.4.1 test a – indexing 745 a.4.2 test b – incremental indexing 749 a.4.3 test c – search performance 749 a.4.4 global evaluation 752 a.5 conclusions 753 b biographies 755 references 761 index 893 contents xxvii |
随便看 |
百科全书收录4421916条中文百科知识,基本涵盖了大多数领域的百科知识,是一部内容开放、自由的电子版百科全书。