Skip to main content
Log in

Websom for Textual Data Mining

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

New methods that are user-friendly and efficient are needed for guidanceamong the masses of textual information available in the Internet and theWorld Wide Web. We have developed a method and a tool called the WEBSOMwhich utilizes the self-organizing map algorithm (SOM) for organizing largecollections of text documents onto visual document maps. The approach toprocessing text is statistically oriented, computationally feasible, andscalable – over a million text documents have been ordered on a single map.In the article we consider different kinds of information needs and tasksregarding organizing, visualizing, searching, categorizing and filteringtextual data. Furthermore, we discuss and illustrate with examples howdocument maps can aid in these situations. An example is presented wherea document map is utilized as a tool for visualizing and filtering a stream ofincoming electronic mail messages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press: New York.

    Google Scholar 

  • Chen, H., Schuffels, C. & Orwig, R. (1996). Internet Categorization and Search: A Machine Learning Approach. Journal of Visual Communication and Image Representation 7(1): 88-102.

    Google Scholar 

  • Deboeck, G. & Kohonen, T. (eds.) (1998). Visual Explorations in Finance with Self-Organizing Maps. Springer: London.

    Google Scholar 

  • Deerwester, S., Dumais, S. T., Furnas, G. W. & Landauer, T. K. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41: 391-407.

    Google Scholar 

  • Callant, S. I., Caid, W. R., Carleton, J., Hecht-Nielsen, R., Pu Qing, K. & Sudbeck, D. (1992). HNC's MatchPlus System. ACM SIGIR Forum 26(2): 34-38.

    Google Scholar 

  • Golub, G. H. & van Loan, C. F. (1983). Matrix Computations. North Oxford Academic: Oxford, England.

    Google Scholar 

  • Goser, K., Hilleringmann, U., Rueckert, U. & Schumacher, K. (1989). VLSI Technologies for Artificial Neural Networks. IEEE Micro 9(6): 28-44.

    Google Scholar 

  • Hartigan, J. (1975). Clustering Algorithms. Wiley: New York.

    Google Scholar 

  • Honkela, T. (1997). Self-Organizing Maps in Natural Language Processing. PhD thesis, Helsinki University of Technology, Espoo, Finland. URL http://www.cis.hut.fi/~tho/thesis/.

    Google Scholar 

  • Honkela, T., Kaski, S., Lagus, K. & Kohonen, T. (1996). Newsgroup Exploration with WEBSOM Method and Browsing Interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland. URL http://websom.hut.fi/websom/doc/websom.ps.gz.

    Google Scholar 

  • Honkela, T., Kaski, S., Lagus, K. & Kohonen, T. (1997). WEBSOM — Self-Organizing Maps of Document Collections. In Proceedings of WSOM '97, Workshop on Self-Organizing Maps, Espoo, Finland, June 4–6, 310-315. Helsinki University of Technology, Neural Networks Research Centre, Espoo, Finland: URL http://www.cis.hut.fi/wsom97/progabstracts/ps/honkela_1.ps.

    Google Scholar 

  • Honkela, T., Pulkki, V. & Kohonen, T. (1995). Contextual Relations of Words in Grimm Tales Analyzed by Self-Organizing Map. In Fogelman-Soulié, F. & Gallinari, P. (eds.) Proceedings of ICANN-95, International Conference on Artificial Neural Networks II, 3-7. EC2 et Cie: Paris.

    Google Scholar 

  • Hyötyniemi, H. (1996). Text Document Classification with Self-Organizing Maps. In Alander, J., Honkela, T. & Jakobsson, M. (eds.) Proceedings of Finnish Artificial Intelligence Conference — Genes, Nets and Symbols, 64-72. Finnish Artifical Intelligence Society.

  • Jain, A. K. & Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall: Englewood Cliffs, NJ.

    Google Scholar 

  • Jardine, N. & Sibson, R. (1971). Mathematical Taxonomy. Wiley: London.

    Google Scholar 

  • Kaski, S. (1998). Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering. In Proceedings of IJCNN '98, International Joint Conference on Neural Networks 1 413-418. IEEE Service Center: Piscataway, NJ.

    Google Scholar 

  • Kaski, S., Honkela, T., Lagus, K. & Kohonen, T. (1996). Creating an Order in Digital Libraries with Self-Organizing Maps. In Proceedings of WCNN '96 World Congress on Neural Networks, September 15–18, San Diego, California 814-817. Lawrence Erlbaum and INNS Press: Mahwah, NJ.

    Google Scholar 

  • Kaski, S., Honkela, T., Lagus, K. & Kohonen, T. (1998a). WEBSOM — Self-Organizing Maps of Document Collections. Neurocomputing, in press.

  • Kaski, S., Kangas, J. & Kohonen, T. (1998b). Bibliography of Self-Organizing Map (SOM) Papers: 1981–1997. Neural Computing Surveys 1: 102-350. URL http://www.icsi.berkeley.edu/~jagota/NCS/.

    Google Scholar 

  • Kaski, S. & Kohonen, T. (1996). Exploratory Data Analysis by the Self-Oragnizing Map: Structures of Welfare and Poverty in the World. In Refenes, A.-P. N., Abu-Mostafa, Y., Moody, J. & Weigend, A. (eds.) Neural Networks in Financial Engineering. Proceedings of the Third International Conference on Neural Networks in the Capital Markets, London, England, 11–13 October, 1995, 498-507. World Scientific: Singapore.

    Google Scholar 

  • Kohonen, T. (1982). Self-Organizing Formation of Topologically Correct Feature Maps. Biological Cybernetics 43(1): 59-69.

    Google Scholar 

  • Kohonen, T. (1995). Self-Organizing Maps. Springer: Berlin, Heidelberg. 2nd extended edn., 1997.

    Google Scholar 

  • Kohonen, T. (1996). The Speedy SOM. Technical Report A33, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.

    Google Scholar 

  • Kohonen, T. (1997). Exploration of Very Large Databases by Self-Organizing Maps. In Proceedings of ICNN '97, International Conference on Neural Networks, PL1-PL6. IEEE Service Center: Piscataway, NJ.

    Google Scholar 

  • Kohonen, T., Hynninen, J., Kangas, J. & Laaksonen, J. (1996a). SOM_PAK: The Self-Organizing Map Program Package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science. URL http://www.cis.hut.fi/nnrc/papers/som_tr96.ps.Z.

  • Kohonen, T., Kaski, S., Lagus, K. & Honkela, T. (1996b). Very Large Two-Level SOM for the Browsing of Newsgroups. In von der Malsburg, C., von Seelen, W., Vorbrüggen, J. C. & Sendhoff, B. (eds.) Proceedings of ICANN96, International Conference on Artifical Neural Networks, Bochun, Germany, July 16–19, 1996, Lecture Notes in Computer Science 1112, 269-274. Springer: Berlin.

    Google Scholar 

  • Kohonen, T., Oja, E., Simula, O., Visa, A. & Kangas, J. (1996c). Engineering Applications of the Self-Organizing Map. Proceedings of the IEEE 84: 1358-1384.

    Google Scholar 

  • Lagus, K. (1997). Map of WSOM'97 Abstracts — Alternative Index. In Proceedings of WSOM'97, Workshop on Self-Organzing Maps, Espoo, Finland, June 4–6, 368-372. Helsinki University of Technology, Neural Networks Research Centre: Espoo, Finland, URL http://www.cis.hut.fi/wsom97/progabstracts/ps/lagus.ps.

    Google Scholar 

  • Lagus, K., Honkela, T., Kaski, S. & Kohonen, T. (1996). Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration. In Simoudis, E., Han, J. & Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 238-243. AAAI Press: Menlo Park, California.

    Google Scholar 

  • Lin, X. (1992). Visualization for the Document Space. In Proceedings of Visualization '92, 274-281. IEEE Comput. Soc. Press: Los Alamitos, CA, USA.

    Google Scholar 

  • Lin, X. (1997). Map Displays for Information Retrieval. Journal of the American Society for Information Science 48: 40-54.

    Google Scholar 

  • Lin, X., Soergel, D. & Marchionini, G. (1991). A Self-Organizing Semantic Map for Information Retrieval. In Proceedings of 14th Ann. International ACM/SIGIR Conference on Research & Development in Information Retrieval, 262-269.

  • Merkl, D. (1993). Structuring Software for Reuse — the Case of self-Organizing Maps. In Proceedings of IJCNN-93-Nagoya, International Joint Conference on Neural Networks III, 2468-2471. IEEE Service Center: Piscataway, NJ.

    Google Scholar 

  • Merkl, D. (1995). Content-Based Software Classification by Self-Organization. In Proceedings of ICNN '95, IEEE International Conference on Neural Networks II, 1986-1091. IEEE Service Center: Piscataway, NJ.

    Google Scholar 

  • Merkl, D. (1997). Exploration of Text Collections with Hierarchical Feature Maps. In Proceedings of SIGIR '97, 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM.

    Google Scholar 

  • Orwig, R., Chen, H. & Nunamaker, J. F. (1997). A Graphical, Self-Organizing Approach to Classifying Electronic Meeting Output. Journal of American Society for Information Science 48(2): 157-170.

    Google Scholar 

  • Ritter, H. & Kohonen, T. (1989). Self-Organizing Semantic Maps. Biological Cybernetics 61: 241-254.

    Google Scholar 

  • Salton, G. (1989). Automatic Text Processing. Addison-Wesley: Reading, MA.

    Google Scholar 

  • Salton, G. & Buckley, C. (1987). Term Weighting Approaches in Automatic Text Retrieval. Technical Report 87-881, Cornell University, Department of Computer Science, Ithaca, NY.

    Google Scholar 

  • Salton, G. & McGill, M. J. (1983). Introduction to Modern Inforamtion Retrieval. McGraw-Hill: New York.

    Google Scholar 

  • Scholtes, J. C. (1993). Neural Networks in Natural Language Processing and Information Retrieval. PhD thesis, Universiteit van Amsterdam, Amsterdam, Netherlands.

    Google Scholar 

  • Simula, O., Alhoniemi, E., Hollm#x00E9;n, J. & Vesanto, J. (1997). Analysis of Complex Systems Using the Self-Organizing Map. In Kasabov, N., Kozma, R., Ko, K., O'Shea, R., Coghill, G. & Gedeon, T. (eds.) Progress in Connectionsist-Based Information Systems. Proceedings of the 1997 International Confernece on Neural Information Processing and Intelligent Information Systems 2, 1313-1317. Springer: Singapore.

    Google Scholar 

  • Tryon, R. C. & Bailey, D. E. (1973). Cluster Analysis. McGraw-Hill: New York.

    Google Scholar 

  • Ultsch, A. (1993). Self-Organizing Neural Networks for Visualization and Classification. In Opitz, O., Lausen, B. & Klar, R. (eds.) Information and Classification, 307-313. Springer: London, UK.

    Google Scholar 

  • WSOM'97 (1997). Proceedings of WSOM '97, Workshop on Self-Organizing Maps, Espoo, Finland: Helsinki Unviersity of Technology, Neural Networks Research Centre.

    Google Scholar 

  • Zavrel, J. (1996). Neural Navigation Interfaces for Information Retrieval: Are They More than an Appealing Idea? Artifical Intelligence Review 10(5–6): 477-504.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krista Lagus.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lagus, K., Honkela, T., Kaski, S. et al. Websom for Textual Data Mining. Artificial Intelligence Review 13, 345–364 (1999). https://doi.org/10.1023/A:1006586221250

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1006586221250

Navigation