Abstract
New methods that are user-friendly and efficient are needed for guidanceamong the masses of textual information available in the Internet and theWorld Wide Web. We have developed a method and a tool called the WEBSOMwhich utilizes the self-organizing map algorithm (SOM) for organizing largecollections of text documents onto visual document maps. The approach toprocessing text is statistically oriented, computationally feasible, andscalable – over a million text documents have been ordered on a single map.In the article we consider different kinds of information needs and tasksregarding organizing, visualizing, searching, categorizing and filteringtextual data. Furthermore, we discuss and illustrate with examples howdocument maps can aid in these situations. An example is presented wherea document map is utilized as a tool for visualizing and filtering a stream ofincoming electronic mail messages.
Similar content being viewed by others
References
Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic Press: New York.
Chen, H., Schuffels, C. & Orwig, R. (1996). Internet Categorization and Search: A Machine Learning Approach. Journal of Visual Communication and Image Representation 7(1): 88-102.
Deboeck, G. & Kohonen, T. (eds.) (1998). Visual Explorations in Finance with Self-Organizing Maps. Springer: London.
Deerwester, S., Dumais, S. T., Furnas, G. W. & Landauer, T. K. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41: 391-407.
Callant, S. I., Caid, W. R., Carleton, J., Hecht-Nielsen, R., Pu Qing, K. & Sudbeck, D. (1992). HNC's MatchPlus System. ACM SIGIR Forum 26(2): 34-38.
Golub, G. H. & van Loan, C. F. (1983). Matrix Computations. North Oxford Academic: Oxford, England.
Goser, K., Hilleringmann, U., Rueckert, U. & Schumacher, K. (1989). VLSI Technologies for Artificial Neural Networks. IEEE Micro 9(6): 28-44.
Hartigan, J. (1975). Clustering Algorithms. Wiley: New York.
Honkela, T. (1997). Self-Organizing Maps in Natural Language Processing. PhD thesis, Helsinki University of Technology, Espoo, Finland. URL http://www.cis.hut.fi/~tho/thesis/.
Honkela, T., Kaski, S., Lagus, K. & Kohonen, T. (1996). Newsgroup Exploration with WEBSOM Method and Browsing Interface. Technical Report A32, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland. URL http://websom.hut.fi/websom/doc/websom.ps.gz.
Honkela, T., Kaski, S., Lagus, K. & Kohonen, T. (1997). WEBSOM — Self-Organizing Maps of Document Collections. In Proceedings of WSOM '97, Workshop on Self-Organizing Maps, Espoo, Finland, June 4–6, 310-315. Helsinki University of Technology, Neural Networks Research Centre, Espoo, Finland: URL http://www.cis.hut.fi/wsom97/progabstracts/ps/honkela_1.ps.
Honkela, T., Pulkki, V. & Kohonen, T. (1995). Contextual Relations of Words in Grimm Tales Analyzed by Self-Organizing Map. In Fogelman-Soulié, F. & Gallinari, P. (eds.) Proceedings of ICANN-95, International Conference on Artificial Neural Networks II, 3-7. EC2 et Cie: Paris.
Hyötyniemi, H. (1996). Text Document Classification with Self-Organizing Maps. In Alander, J., Honkela, T. & Jakobsson, M. (eds.) Proceedings of Finnish Artificial Intelligence Conference — Genes, Nets and Symbols, 64-72. Finnish Artifical Intelligence Society.
Jain, A. K. & Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall: Englewood Cliffs, NJ.
Jardine, N. & Sibson, R. (1971). Mathematical Taxonomy. Wiley: London.
Kaski, S. (1998). Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering. In Proceedings of IJCNN '98, International Joint Conference on Neural Networks 1 413-418. IEEE Service Center: Piscataway, NJ.
Kaski, S., Honkela, T., Lagus, K. & Kohonen, T. (1996). Creating an Order in Digital Libraries with Self-Organizing Maps. In Proceedings of WCNN '96 World Congress on Neural Networks, September 15–18, San Diego, California 814-817. Lawrence Erlbaum and INNS Press: Mahwah, NJ.
Kaski, S., Honkela, T., Lagus, K. & Kohonen, T. (1998a). WEBSOM — Self-Organizing Maps of Document Collections. Neurocomputing, in press.
Kaski, S., Kangas, J. & Kohonen, T. (1998b). Bibliography of Self-Organizing Map (SOM) Papers: 1981–1997. Neural Computing Surveys 1: 102-350. URL http://www.icsi.berkeley.edu/~jagota/NCS/.
Kaski, S. & Kohonen, T. (1996). Exploratory Data Analysis by the Self-Oragnizing Map: Structures of Welfare and Poverty in the World. In Refenes, A.-P. N., Abu-Mostafa, Y., Moody, J. & Weigend, A. (eds.) Neural Networks in Financial Engineering. Proceedings of the Third International Conference on Neural Networks in the Capital Markets, London, England, 11–13 October, 1995, 498-507. World Scientific: Singapore.
Kohonen, T. (1982). Self-Organizing Formation of Topologically Correct Feature Maps. Biological Cybernetics 43(1): 59-69.
Kohonen, T. (1995). Self-Organizing Maps. Springer: Berlin, Heidelberg. 2nd extended edn., 1997.
Kohonen, T. (1996). The Speedy SOM. Technical Report A33, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo, Finland.
Kohonen, T. (1997). Exploration of Very Large Databases by Self-Organizing Maps. In Proceedings of ICNN '97, International Conference on Neural Networks, PL1-PL6. IEEE Service Center: Piscataway, NJ.
Kohonen, T., Hynninen, J., Kangas, J. & Laaksonen, J. (1996a). SOM_PAK: The Self-Organizing Map Program Package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science. URL http://www.cis.hut.fi/nnrc/papers/som_tr96.ps.Z.
Kohonen, T., Kaski, S., Lagus, K. & Honkela, T. (1996b). Very Large Two-Level SOM for the Browsing of Newsgroups. In von der Malsburg, C., von Seelen, W., Vorbrüggen, J. C. & Sendhoff, B. (eds.) Proceedings of ICANN96, International Conference on Artifical Neural Networks, Bochun, Germany, July 16–19, 1996, Lecture Notes in Computer Science 1112, 269-274. Springer: Berlin.
Kohonen, T., Oja, E., Simula, O., Visa, A. & Kangas, J. (1996c). Engineering Applications of the Self-Organizing Map. Proceedings of the IEEE 84: 1358-1384.
Lagus, K. (1997). Map of WSOM'97 Abstracts — Alternative Index. In Proceedings of WSOM'97, Workshop on Self-Organzing Maps, Espoo, Finland, June 4–6, 368-372. Helsinki University of Technology, Neural Networks Research Centre: Espoo, Finland, URL http://www.cis.hut.fi/wsom97/progabstracts/ps/lagus.ps.
Lagus, K., Honkela, T., Kaski, S. & Kohonen, T. (1996). Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration. In Simoudis, E., Han, J. & Fayyad, U. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 238-243. AAAI Press: Menlo Park, California.
Lin, X. (1992). Visualization for the Document Space. In Proceedings of Visualization '92, 274-281. IEEE Comput. Soc. Press: Los Alamitos, CA, USA.
Lin, X. (1997). Map Displays for Information Retrieval. Journal of the American Society for Information Science 48: 40-54.
Lin, X., Soergel, D. & Marchionini, G. (1991). A Self-Organizing Semantic Map for Information Retrieval. In Proceedings of 14th Ann. International ACM/SIGIR Conference on Research & Development in Information Retrieval, 262-269.
Merkl, D. (1993). Structuring Software for Reuse — the Case of self-Organizing Maps. In Proceedings of IJCNN-93-Nagoya, International Joint Conference on Neural Networks III, 2468-2471. IEEE Service Center: Piscataway, NJ.
Merkl, D. (1995). Content-Based Software Classification by Self-Organization. In Proceedings of ICNN '95, IEEE International Conference on Neural Networks II, 1986-1091. IEEE Service Center: Piscataway, NJ.
Merkl, D. (1997). Exploration of Text Collections with Hierarchical Feature Maps. In Proceedings of SIGIR '97, 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM.
Orwig, R., Chen, H. & Nunamaker, J. F. (1997). A Graphical, Self-Organizing Approach to Classifying Electronic Meeting Output. Journal of American Society for Information Science 48(2): 157-170.
Ritter, H. & Kohonen, T. (1989). Self-Organizing Semantic Maps. Biological Cybernetics 61: 241-254.
Salton, G. (1989). Automatic Text Processing. Addison-Wesley: Reading, MA.
Salton, G. & Buckley, C. (1987). Term Weighting Approaches in Automatic Text Retrieval. Technical Report 87-881, Cornell University, Department of Computer Science, Ithaca, NY.
Salton, G. & McGill, M. J. (1983). Introduction to Modern Inforamtion Retrieval. McGraw-Hill: New York.
Scholtes, J. C. (1993). Neural Networks in Natural Language Processing and Information Retrieval. PhD thesis, Universiteit van Amsterdam, Amsterdam, Netherlands.
Simula, O., Alhoniemi, E., Hollm#x00E9;n, J. & Vesanto, J. (1997). Analysis of Complex Systems Using the Self-Organizing Map. In Kasabov, N., Kozma, R., Ko, K., O'Shea, R., Coghill, G. & Gedeon, T. (eds.) Progress in Connectionsist-Based Information Systems. Proceedings of the 1997 International Confernece on Neural Information Processing and Intelligent Information Systems 2, 1313-1317. Springer: Singapore.
Tryon, R. C. & Bailey, D. E. (1973). Cluster Analysis. McGraw-Hill: New York.
Ultsch, A. (1993). Self-Organizing Neural Networks for Visualization and Classification. In Opitz, O., Lausen, B. & Klar, R. (eds.) Information and Classification, 307-313. Springer: London, UK.
WSOM'97 (1997). Proceedings of WSOM '97, Workshop on Self-Organizing Maps, Espoo, Finland: Helsinki Unviersity of Technology, Neural Networks Research Centre.
Zavrel, J. (1996). Neural Navigation Interfaces for Information Retrieval: Are They More than an Appealing Idea? Artifical Intelligence Review 10(5–6): 477-504.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lagus, K., Honkela, T., Kaski, S. et al. Websom for Textual Data Mining. Artificial Intelligence Review 13, 345–364 (1999). https://doi.org/10.1023/A:1006586221250
Issue Date:
DOI: https://doi.org/10.1023/A:1006586221250