skip to main content
10.1145/775152.775192acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Adaptive on-line page importance computation

Published:20 May 2003Publication History

ABSTRACT

The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.

References

  1. S. Abiteboul, G. Cobena, J. Masanes, and G. Sedrati. A first experience in archiving the french web. ECDL, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, M. Preda, and G. Cobena. Computing web page importance without storing the graph of the web (extended abstract). IEEE-CS Data Engineering Bulletin, Volume 25, 2002.]]Google ScholarGoogle Scholar
  3. Alta vista. http://www.altavista.com/.]]Google ScholarGoogle Scholar
  4. Andrei Z. Broder and al. Graph structure in the web. WWW9/Computer Networks, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Steve Chien, Cynthia Dwork, Ravi Kumar, Dan Simon, and D. Sivakumar. Link evolution: Analysis and algorithms. In Workshop on Algorithms and Models for the Web Graph (WAW), 2002.]]Google ScholarGoogle Scholar
  6. Junghoo Cho, Hector García-Molina, and Lawrence Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1-7):161--172, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kai Lai Chung. Markov chains with stationary transition probabilities. Springer, 1967.]]Google ScholarGoogle Scholar
  8. Colin Cooper and Alan M. Frieze. A general model of undirected web graphs. In European Symposium on Algorithms, pages 500--511, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Dong D. Zhang. An efficient algorithm to rank web resources. 9 th International World Wide Web Conference, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. R. Gantmacher. Applications of the theory of matrices. In Interscience Publishers, pages 64--79, 1959.]]Google ScholarGoogle Scholar
  11. Google. http://www.google.com/.]]Google ScholarGoogle Scholar
  12. T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.]]Google ScholarGoogle Scholar
  13. H. Garcia-Molina J. Cho. Synchronizing a database to improve freshness. SIGMOD, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M.R. Henzinger J. Dean. Finding related pages in the world wide web. 8th International World Wide Web Conference, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Broder K. Bharat. Estimating the relative size and overlap of public web search engines. 7th International World Wide Web Conference (WWW7), 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G.V. Meghabghab. Google's web page ranking applied to different topological web graph structures. JASIS 52(9), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. ACM Computing Surveys, 28(1):33--37, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web, 1998.]]Google ScholarGoogle Scholar
  20. M. Preda. Data acquisition for an xml warehouse. DEA thesis Paris 7 University, 2000.]]Google ScholarGoogle Scholar
  21. L. Page S. Brin. The anatomy of a large-scale hypertextual web search engine. WWW7 Conference, Computer Networks 30(1--7), 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Dom S. Chakrabarti, M. van den Berg. Focused crawling: a new approach to topic-specific web resource discovery. 8th World Wide Web Conference, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Giles S. Lawrence. Accessibility and distribution of information on the web. Nature, 1999.]]Google ScholarGoogle Scholar
  24. Search-engine watch. www.searchenginewatch.com/.]]Google ScholarGoogle Scholar
  25. S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development, 41(6):711--??, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.J. van Rijsbergen. Information retrieval. London, Butterworths, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lucie Xyleme. A dynamic warehouse for xml data of the web. IEEE Data Engineering Bulletin, 2001.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xyleme. www.xyleme.com.]]Google ScholarGoogle Scholar

Index Terms

  1. Adaptive on-line page importance computation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WWW '03: Proceedings of the 12th international conference on World Wide Web
          May 2003
          772 pages
          ISBN:1581136803
          DOI:10.1145/775152

          Copyright © 2003 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 May 2003

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

          Upcoming Conference

          WWW '24
          The ACM Web Conference 2024
          May 13 - 17, 2024
          Singapore , Singapore

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader