Abstract
Text mining is the extraction of useful information from large volumes of text. A text mining analysis of the global open nanotechnology literature was performed. Records from the Science Citation Index (SCI)/Social SCI were analyzed to provide the infrastructure of the global nanotechnology literature (prolific authors/journals/institutions/countries, most cited authors/papers/journals) and the thematic structure (taxonomy) of the global nanotechnology literature, from a science perspective. Records from the Engineering Compendex (EC) were analyzed to provide a taxonomy from a technology perspective.
-
The Far Eastern countries have expanded nanotechnology publication output dramatically in the past decade.
-
The Peoples Republic of China ranks second to the USA (2004 results) in nanotechnology papers published in the SCI, and has increased its nanotechnology publication output by a factor of 21 in a decade.
-
Of the six most prolific (publications) nanotechnology countries, the three from the Western group (USA, Germany, France) have about eight percent more nanotechnology publications (for 2004) than the three from the Far Eastern group (China, Japan, South Korea).
-
While most of the high nanotechnology publication-producing countries are also high nanotechnology patent producers in the US Patent Office (as of 2003), China is a major exception. China ranks 20th as a nanotechnology patent-producing country in the US Patent Office.
Similar content being viewed by others
References
Bhushan B., 2004. Springer Handbook of Nanotechnology. Springer
Colton R.J. (2004). Nanoscale measurements and manipulation. Journal of Vacuum Science and Technology B 22(4):1609–1635
Davidse R.J., Van Raan A.F.J. (1997). Out of particles: impact of CERN, DESY, and SLAC research to fields other than physics. Scientometrics 40(2):171–193
Dowling A. et al., 2004. Nanoscience and Nanotechnologies: Opportunities and Uncertainties. The Royal Society and the Royal Academy of Engineering. 29 July
Freitas R.A., 1999. Nanomedicine, Vol. 1: Basic Capabilities. Landes Bioscience
Freitas R.A., 2003. Nanomedicine, Vol. 2: Biocompatibility. Landes Bioscience
Garfield E. (1985). History of citation indexes for chemistry - a brief review. JCICS 25(3):170–174
Goddard W.A., D.W. Brenner, S.E. Lyshevski & G.J. Iafrate, 2002. Handbook of Nanoscience, Engineering, and Technology. CRC Press
Goldman J.A., Chu W.W., Parker D.S., Goldman R.M. (1999). Term domain distribution analysis: a data mining tool for text databases. Methods of Information in Medicine 38: 96–101
Gordon M.D., Dumais S. (1998). Using latent semantic indexing for literature based discovery. Journal of the American Society for Information Science 49(8):674–685
Greengrass E., 1997. Information Retrieval: An Overview. National Security Agency. TR-R52–02–96
Hearst M.A., 1999. Untangling text data mining. Proceedings of ACL 99, the 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, June 20–26
Huang Z., Chen H., Chen Z.K., and Roco M.C. (2004). International Nanotechnology Development in 2003; Country, Institution, and Technology Field Analysis based on USPTO Patent Database. Journal of Nanoparticle Research 6:325–354
Karypis G., 2005. CLUTO – A Clustering Toolkit. http://www.cs.umn.edu/∼ ∼cluto
Kostoff R.N. (1998). The use and misuse of citation analysis in research evaluation. Scientometrics 43(1):27–43
Kostoff R.N. (2003a). Text mining for global technology watch. In: Drake M. (ed) Encyclopedia of Library and Information Science, Second Edition. Marcel Dekker, Inc., New York, NY, Vol. 4. pp. 2789–2799
Kostoff R.N., 2003b Stimulating innovation. In: Larisa V. Shavinina (ed.). International Handbook of Innovation. Elsevier Social and Behavioral Sciences, Oxford, U.K., pp. 388–400
Kostoff R.N. (2003c). Bilateral asymmetry prediction. Medical Hypotheses 61(2): 265–266
Kostoff R.N. (2005a). Systematic acceleration of radical discovery and innovation in science and technology. DTIC Technical Report Number ADA430720 (http://www.dtic.mil/). Defense Technical Information Center, Fort Belvoir, VA
Kostoff R.N., Del Rio J.A., García E.O., Ramírez A.M., Humenik J.A. (2001). Citation mining: integrating text mining and bibliometrics for research user profiling. Journal of the American Society for Information Science and Technology 52(13):1148–1156
Kostoff R.N., Eberhart H.J., Toothman D.R. (1997). Database Tomography for information retrieval. Journal of Information Science 23(4):301–311
Kostoff R.N., Green K.A., Toothman D.R. and Humenik J.A. (2000). Database Tomography applied to an aircraft science and technology investment strategy. Journal of Aircraft 37(4):727–730
Kostoff R.N., J.S. Murday, C.G.Y. Lau & W.M. Tolles, 2005a. The Seminal Literature of Nanotechnology Research. DTIC Technical Report Number ADA435986 (http://www.dtic.mil/). Defense Technical Information Center. Fort Belvoir, VA. Also, an abridged version is published in this issue
Kostoff R.N., Shlesinger M., Malpohl G. (2004b). Fractals roadmaps using bibliometrics and database tomography. Fractals 12(1): 1–16
Kostoff R.N., Shlesinger M., Tshiteya R. (2004a). Nonlinear dynamics roadmaps using bibliometrics and Database Tomography. International Journal of Bifurcation and Chaos 14(1):61–92
Kostoff R.N., Stump J.A., Johnson D., Murday J.S., Lau C.G.Y., Tolles WM. (2005e). The structure and infrastructure of the global nanotechnology literature. DTIC Technical Report Number ADA435984 (http://www.dtic.mil/). Defense Technical Information Center, Fort Belvoir, VA
Kricka L.J. and Fortina P. (2002). Nanotechnology and applications: An all-language literature survey including books and patents. Clinical Chemistry 48(4):662–665
Losiewicz P., Oard D., Kostoff R.N. (2000). Textual data mining to support science and technology management. Journal of Intelligent Information Systems 15: 99–119
MacRoberts M., MacRoberts B. (1996). Problems of citation analysis. Scientometrics 36(3):435–444
Narin F., 1976. Evaluative bibliometrics: the use of publication and citation analysis in the evaluation of scientific activity (monograph). NSF C-637. National Science Foundation. Contract NSF C-627. NTIS Accession No. PB252339/AS
Narin F., Olivastro D., Stevens K.A. (1994). Bibliometrics theory, practice and problems. Evaluation Review 18(1):65–76
Schubert A., Glanzel W., Braun T. (1987). Subject field characteristic citation scores and scales for assessing research performance. Scientometrics 12(5–6):267–291
SCI (2005) Science Citation Index. Institute for Scientific Information, Phila., PA
Simon J. (2005). Micro- and nanotechnologies: dullish electrons and smart molecules. Comptes Rendus Chimie 8(5):893–902
Swanson D.R. (1986) Fish Oil, Raynauds Syndrome, and undiscovered public knowledge. Perspect Biol Med. 30(1):7–18
Swanson D.R., Smalheiser N.R. (1997). An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif Intell 91(2):183–203
TREC (Text Retrieval Conference), 2004. Home Page, http://trec.nist.gov/
Viator J.A., Pestorius F.M. (2001). Investigating trends in acoustics research from 1970–1999. Journal of the Acoustical Society of America 109(5):1779–1783 Part 1
Winkmann G., Schlutius S., Schweim H.G. (2002). Citation rates of medical German-language journals in English-language papers - do they correlate with the Impact Factor, and who cites?. Klinische Monatsblatter fur Augenheilkunde 219(1–2):72–78
Zhao Y., Karypis G. (2004). Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3): 311–331
Zhu D.H. & A.L. Porter, 2002. Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change. 69 (5):495–506.
Author information
Authors and Affiliations
Corresponding author
Appendix 1 – EC and SCI factor analysis
Appendix 1 – EC and SCI factor analysis
Factor analysis of a text database aims to reduce the number of words/phrases (variables) in a system, and to detect structure in the relationships among words/phrases. Word/phrase correlations are computed, and highly correlated groups (factors) are identified. The relationships of these words/phrases to the resultant factors are displayed clearly in the factor matrix, whose rows are words/phrases and columns are factors. In the factor matrix, the matrix elements Mij are the factor loadings, or the contribution of word/phrase i (in row i) to the theme of factor j (in column j). The theme of each factor is determined by those words/phrases that have the largest values of factor loading. Each factor has a positive value tail and negative value tail. For each factor, one of the tails dominates in terms of absolute value magnitude. This dominant tail is used to determine the central theme of each factor.
Factor analyses were performed on the EC and SCI retrievals. Factor matrices ranging from 2 to 32 factors were generated, the main themes identified, and the themes were manually categorized into a hierarchical taxonomy. The SCI taxonomy is presented first, followed by the EC taxonomy.
SCI taxonomy
Level 1
-
Instruments (XRD-TEM-SEM)
-
Phenomena/Properties (Crystal structure)
Level 2
-
Instruments (XRD-TEM-SEM; Differential calorimetry)
-
Phenomena/Properties (Crystal structure; Surface adsorption (SAM/Film deposition))
Level 3
-
Instruments (XRD-TEM-SEM; Differential calorimetry; AFM)
-
Phenomena/Properties (Crystal structure; Surface adsorption (SAM/Film deposition); Photoluminescence (Quantum dots); Catalysis
EC taxonomy
For a two factor analysis, the main thrusts are:
-
(1)
Films
-
(2)
Nanocomposites–Clay/Differential calorimetry
For a four-factor analysis, the main thrusts are:
-
(1)
Films (hardness, mechanical properties)
-
(2)
Nanocomposites–Clay/Differential calorimetry
-
(3)
Nanoparticle formation/reaction/catalysis
-
(4)
Microstructure (Ni/Zr/C/B)
For an eight-factor analysis, the main thrusts are:
-
(1)
Differential calorimetry/Nanocomposites–Clay
-
(2)
Films (temperature/thickness/deposition)
-
(3)
XRD/TEM (size, catalysis)
-
(4)
Ni/Cu (alloys, Fe, Co)
-
(5)
Hardness/Mechanical properties
-
(6)
CNT
-
(7)
SAMs
-
(8)
Crystal structure
These results contrast the differences between the SCI and EC databases from the factor matrix perspective, as well as the differences between document clustering-based taxonomies and factor matrix-based taxonomies. The document clustering taxonomies are categorized essentially by structures (e.g., nanowires, nanotubes, nanoparticles, films) and phenomena (optics, magnetics). The SCI factor matrix taxonomies are characterized by instruments (XRD, TEM, SEM, AFM, differential calorimetry) and the quantities they measure (crystal structure, surface adsorption, photoluminescence). The EC factor matrix taxonomies are characterized by structures (films, nanocomposites, nanoparticles, microstructures).
At the first level of the factor matrix taxonomies, the science focus of the SCI, which concentrates on instrumentation and basic scientific phenomena (crystal structure), is clearly seen. The technology focus of the EC, which concentrates on structures and materials (films, nano-composites-clay) is also evident.
At the second level, the science focus of the SCI remains the same, with additional instrumentation and measured phenomena shown. The EC focus continues on particles and microstructure. At the third level, the focus of the EC on structures and materials continues (CNT, SAMs, alloys, mechanical properties), but some of the applied research aspects begin to emerge (XRD/TEM, crystal structure).
Rights and permissions
About this article
Cite this article
Kostoff, R.N., Stump, J.A., Johnson, D. et al. The structure and infrastructure of the global nanotechnology literature. J Nanopart Res 8, 301–321 (2006). https://doi.org/10.1007/s11051-005-9035-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11051-005-9035-8