Abstract
The labor-intensive task of labeling data is a serious bottleneck for many supervised learning approaches for natural language processing applications. Active learning aims to reduce the human labeling cost for supervised learning methods. Determining when to stop the active learning process is a very important practical issue in real-world applications. This article addresses the stopping criterion issue of active learning, and presents four simple stopping criteria based on confidence estimation over the unlabeled data pool, including maximum uncertainty, overall uncertainty, selected accuracy, and minimum expected error methods. Further, to obtain a proper threshold for a stopping criterion in a specific task, this article presents a strategy by considering the label change factor to dynamically update the predefined threshold of a stopping criterion during the active learning process. To empirically analyze the effectiveness of each stopping criterion for active learning, we design several comparison experiments on seven real-world datasets for three representative natural language processing applications such as word sense disambiguation, text classification and opinion analysis.
- Angliun, D. 1988. Queries and concept learning. Mach. Learn. 2, 3, 319--342. Google ScholarDigital Library
- Baram, Y., El-Yaniv, R., and Luz, K. 2004. Online choice of active learning algorithms. J. Mach. Learn. Res. 5, 255--291. Google ScholarDigital Library
- Becker, M. and Osborne, M. 2005. A two-stage method for active learning of statistical grammars. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996. Google ScholarDigital Library
- Berger, A. L., Della Pietra, S. A., and Della Pietra, V. J. 1996. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1, 39--71. Google ScholarDigital Library
- Bruce, R. and Wiebe, J. 1994. Word-Sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 139--146. Google ScholarDigital Library
- Campbell, C., Cristianini, N., and Smola, A. 2000. Query learning with large margin classifiers. In Proceedings of the International Conference on Machine Learning. 111--118. Google ScholarDigital Library
- Chan, Y. S. and Ng, H. T. 2007. Domain adaptation with active learning for word sense disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56.Google Scholar
- Chen, J., Schein, A., Ungar, L., and Palmer, M. 2006. An empirical study of the behavior of active learning for word sense disambiguation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 120--127. Google ScholarDigital Library
- Cohn, D. A., Atlas, L., and Ladner, R. E. 1994. Improving generalization with active learning. Mach. Learn. 15, 201--221. Google ScholarDigital Library
- Cohn, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active learning with statistical models. J. Artif. Intell. Res. 4, 129--145. Google ScholarCross Ref
- Culotta, A. and McCallum, A. 2005. Reducing labeling effort for structured prediction tasks. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05). 746--751. Google ScholarDigital Library
- Dimistakakis, C. and Savu-Krohn, C. 2008. Cost-Minimizing strategies for data labeling: Optimal stopping and active learning. In Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111. Google ScholarDigital Library
- Donmez P., Carbonell, J. G., and Bennett, P. N. 2007. Dual strategy active learning. In Proceedings of the European Conference on Machine Learning (ECML). 1--12. Google ScholarDigital Library
- Duda, R. O. and Hart, P. E. 1973. Pattern Classification and Scene Analysis. Wiley, New York.Google Scholar
- Dagan, I. and Engelson, S. P. 1995. Committee-Based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning. 150--157.Google Scholar
- Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28, 2, 133--168. Google ScholarDigital Library
- Hovy, E. H., Marcus, M., Palmer, M., Ramshaw, L., and Weischedel, R. 2006. Ontonotes: The 90% solution. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60. Google ScholarDigital Library
- Hwa, R. 2000. Sample selection for statistical grammar induction. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. 45--52. Google ScholarDigital Library
- Korner, C. and Wrobel, S. 2006. Multi-Class ensemble-based active learning. In Proceedings of the European Conference on Machine Learning (ECML). 687--694. Google ScholarDigital Library
- Kim, S. M. 2006. Identification, classification, and analysis of opinions on the Web. Ph.D. thesis, University of Southern California. Google ScholarDigital Library
- Jones, R. 2005. Learning to extract entities from labeled and unlabeled text. Ph.D. thesis, Carnegie Mellon University.Google Scholar
- Laws, F. and Schütze, H. 2008. Stopping criteria for active learning of named entity recognition. In Proceedings of the 22nd International Conference on Computational Linguistics. 465--472. Google ScholarDigital Library
- Lee, Y. and Ng, H. 2002. An empirical evaluation of knowledge sources and learning algorithm for word sense disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 41--48. Google ScholarDigital Library
- Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12. Google ScholarDigital Library
- Li, M. and Sethi, I. K. 2006. Confidence-Based active learning. IEEE Trans. Patt. Anal. Mach. Intell. 28, 8, 1251--1261. Google ScholarDigital Library
- Marcus, M., Santorini, B., and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn treebank. Comput. Linguist. 19, 2, 313--330. Google ScholarCross Ref
- McCallum, A. and Nigram, K. 1998a. Employing EM in pool-based active learning for text classification. In Proceedings of 15th International Conference on Machine Learning. 350--358. Google ScholarDigital Library
- McCallum, A. and Nigram, K. 1998b. A comparison of event models for naïve Bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization.Google Scholar
- Ng, H. and Lee, H. 1996. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics. 40--47. Google ScholarDigital Library
- Ngai, G. and Yarowsky, D. 2000. Rule writing or annotation: Cost-Efficient resource usage for based noun phrase chunking. In Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics. 117--125. Google ScholarDigital Library
- Philpot, A., Hovy, E. H., and Pantel, P. 2005. The Omega ontology. In Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66.Google Scholar
- Platt, J. 1999. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Classifiers. 61--74.Google Scholar
- Roth, D. and Small, K. 2008. Active learning for pipeline models. In Proceedings of the National Conference on Artificial Intelligence. 683--688. Google ScholarDigital Library
- Roy, N. and McCallum, A. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning. 441--448. Google ScholarDigital Library
- Schein, A. and Ungar, L. 2007. Active learning for logistic regression: An evaluation. Mach. Learn. 68, 235--265. Google ScholarDigital Library
- Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning. 839--846. Google ScholarDigital Library
- Seung, H. S., Opper, M., and Sompolinsky, H. 1992. Query by committee. In Proceedings of the 5th Annual ACM Conference on Computational Learning Theory. 287--294. Google ScholarDigital Library
- Shen, D., Zhang, J., Su, J., Zhou, G., and Tan, C. 2004. Multi-Criteria-Based active learning for named entity recognition. In Proceedings of the 42th Annual Meeting of the Association of Computational Linguistics. Google ScholarDigital Library
- Tang, M., Luo, X., and Roukos, S. 2002. Active learning for statistical natural language parsing. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics. 120--127. Google ScholarDigital Library
- Tong, S. and Koller, D. 2002. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45--66. Google ScholarDigital Library
- Tomanek, K., Wermter, J., and Hahn, U. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495.Google Scholar
- Thompson, C. A., Califf, M. E., and Mooney, R. J. 1999. Active learning for natural language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning. 406--414. Google ScholarDigital Library
- Vlachos, A. 2008. A stopping criterion for active learning. Comput. Speech Lang. 22, 3, 295--312. Google ScholarDigital Library
- Wiebe, J., Breck, E., Buckley, C., Cardie, C., Davis, P., et al. 2003. Recognizing and organizing opinions expressed in the world press. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering.Google Scholar
- Zhu, J. and Hovy, E. H. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790.Google Scholar
- Zhu, J., Wang, H., and Hovy, E. H. 2008a. Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372.Google Scholar
- Zhu, J., Wang, H., and Hovy, E. H. 2008b. Multi-Criteria-Based strategy to stop active learning for data annotation. In Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136. Google ScholarDigital Library
Index Terms
Confidence-based stopping criteria for active learning for data annotation
Recommendations
Stopping Criterion for Active Learning with Model Stability
Regular PapersActive learning selectively labels the most informative instances, aiming to reduce the cost of data annotation. While much effort has been devoted to active sampling functions, relatively limited attention has been paid to when the learning process ...
Active Learning from Positive and Unlabeled Data
ICDMW '11: Proceedings of the 2011 IEEE 11th International Conference on Data Mining WorkshopsDuring recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label ...
Active Learning With Sampling by Uncertainty and Density for Data Annotations
To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation. One of the key enabling techniques of active learning is uncertainty ...
Comments