skip to main content
research-article

Confidence-based stopping criteria for active learning for data annotation

Published:29 April 2010Publication History
Skip Abstract Section

Abstract

The labor-intensive task of labeling data is a serious bottleneck for many supervised learning approaches for natural language processing applications. Active learning aims to reduce the human labeling cost for supervised learning methods. Determining when to stop the active learning process is a very important practical issue in real-world applications. This article addresses the stopping criterion issue of active learning, and presents four simple stopping criteria based on confidence estimation over the unlabeled data pool, including maximum uncertainty, overall uncertainty, selected accuracy, and minimum expected error methods. Further, to obtain a proper threshold for a stopping criterion in a specific task, this article presents a strategy by considering the label change factor to dynamically update the predefined threshold of a stopping criterion during the active learning process. To empirically analyze the effectiveness of each stopping criterion for active learning, we design several comparison experiments on seven real-world datasets for three representative natural language processing applications such as word sense disambiguation, text classification and opinion analysis.

References

  1. Angliun, D. 1988. Queries and concept learning. Mach. Learn. 2, 3, 319--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baram, Y., El-Yaniv, R., and Luz, K. 2004. Online choice of active learning algorithms. J. Mach. Learn. Res. 5, 255--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Becker, M. and Osborne, M. 2005. A two-stage method for active learning of statistical grammars. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berger, A. L., Della Pietra, S. A., and Della Pietra, V. J. 1996. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1, 39--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bruce, R. and Wiebe, J. 1994. Word-Sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 139--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Campbell, C., Cristianini, N., and Smola, A. 2000. Query learning with large margin classifiers. In Proceedings of the International Conference on Machine Learning. 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chan, Y. S. and Ng, H. T. 2007. Domain adaptation with active learning for word sense disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56.Google ScholarGoogle Scholar
  8. Chen, J., Schein, A., Ungar, L., and Palmer, M. 2006. An empirical study of the behavior of active learning for word sense disambiguation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cohn, D. A., Atlas, L., and Ladner, R. E. 1994. Improving generalization with active learning. Mach. Learn. 15, 201--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cohn, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active learning with statistical models. J. Artif. Intell. Res. 4, 129--145. Google ScholarGoogle ScholarCross RefCross Ref
  11. Culotta, A. and McCallum, A. 2005. Reducing labeling effort for structured prediction tasks. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05). 746--751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dimistakakis, C. and Savu-Krohn, C. 2008. Cost-Minimizing strategies for data labeling: Optimal stopping and active learning. In Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Donmez P., Carbonell, J. G., and Bennett, P. N. 2007. Dual strategy active learning. In Proceedings of the European Conference on Machine Learning (ECML). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Duda, R. O. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. Wiley, New York.Google ScholarGoogle Scholar
  15. Dagan, I. and Engelson, S. P. 1995. Committee-Based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning. 150--157.Google ScholarGoogle Scholar
  16. Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28, 2, 133--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hovy, E. H., Marcus, M., Palmer, M., Ramshaw, L., and Weischedel, R. 2006. Ontonotes: The 90% solution. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hwa, R. 2000. Sample selection for statistical grammar induction. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Korner, C. and Wrobel, S. 2006. Multi-Class ensemble-based active learning. In Proceedings of the European Conference on Machine Learning (ECML). 687--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim, S. M. 2006. Identification, classification, and analysis of opinions on the Web. Ph.D. thesis, University of Southern California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jones, R. 2005. Learning to extract entities from labeled and unlabeled text. Ph.D. thesis, Carnegie Mellon University.Google ScholarGoogle Scholar
  22. Laws, F. and Schütze, H. 2008. Stopping criteria for active learning of named entity recognition. In Proceedings of the 22nd International Conference on Computational Linguistics. 465--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lee, Y. and Ng, H. 2002. An empirical evaluation of knowledge sources and learning algorithm for word sense disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Li, M. and Sethi, I. K. 2006. Confidence-Based active learning. IEEE Trans. Patt. Anal. Mach. Intell. 28, 8, 1251--1261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marcus, M., Santorini, B., and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 2, 313--330. Google ScholarGoogle ScholarCross RefCross Ref
  27. McCallum, A. and Nigram, K. 1998a. Employing EM in pool-based active learning for text classification. In Proceedings of 15th International Conference on Machine Learning. 350--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. McCallum, A. and Nigram, K. 1998b. A comparison of event models for naïve Bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Google ScholarGoogle Scholar
  29. Ng, H. and Lee, H. 1996. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics. 40--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ngai, G. and Yarowsky, D. 2000. Rule writing or annotation: Cost-Efficient resource usage for based noun phrase chunking. In Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics. 117--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Philpot, A., Hovy, E. H., and Pantel, P. 2005. The Omega ontology. In Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66.Google ScholarGoogle Scholar
  32. Platt, J. 1999. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Classifiers. 61--74.Google ScholarGoogle Scholar
  33. Roth, D. and Small, K. 2008. Active learning for pipeline models. In Proceedings of the National Conference on Artificial Intelligence. 683--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Roy, N. and McCallum, A. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning. 441--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Schein, A. and Ungar, L. 2007. Active learning for logistic regression: An evaluation. Mach. Learn. 68, 235--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning. 839--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Seung, H. S., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory. 287--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shen, D., Zhang, J., Su, J., Zhou, G., and Tan, C. 2004. Multi-Criteria-Based active learning for named entity recognition. In Proceedings of the 42th Annual Meeting of the Association of Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tang, M., Luo, X., and Roukos, S. 2002. Active learning for statistical natural language parsing. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics. 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tong, S. and Koller, D. 2002. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tomanek, K., Wermter, J., and Hahn, U. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495.Google ScholarGoogle Scholar
  42. Thompson, C. A., Califf, M. E., and Mooney, R. J. 1999. Active learning for natural language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning. 406--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Vlachos, A. 2008. A stopping criterion for active learning. Comput. Speech Lang. 22, 3, 295--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wiebe, J., Breck, E., Buckley, C., Cardie, C., Davis, P., et al. 2003. Recognizing and organizing opinions expressed in the world press. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering.Google ScholarGoogle Scholar
  45. Zhu, J. and Hovy, E. H. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790.Google ScholarGoogle Scholar
  46. Zhu, J., Wang, H., and Hovy, E. H. 2008a. Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372.Google ScholarGoogle Scholar
  47. Zhu, J., Wang, H., and Hovy, E. H. 2008b. Multi-Criteria-Based strategy to stop active learning for data annotation. In Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Confidence-based stopping criteria for active learning for data annotation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Speech and Language Processing
        ACM Transactions on Speech and Language Processing   Volume 6, Issue 3
        April 2010
        24 pages
        ISSN:1550-4875
        EISSN:1550-4883
        DOI:10.1145/1753783
        Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 April 2010
        • Accepted: 1 February 2010
        • Revised: 1 May 2009
        • Received: 1 July 2008
        Published in tslp Volume 6, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader