AAA

Tools for automatic summarization of texts in Polish. State of the research and implementation workse

Piotr Glenc

Abstract

The goal of the publication is to present the state of research and works carried out in Poland on the issue of automatic text summarization. The author describes principal theoretical and methodological issues related to automatic summary generation followed by the outline of the selected works on the automatic abstracting of Polish texts. The author also provides three examples of IT tools that generate summaries of texts in Polish (Summarize, Resoomer, and NICOLAS) and their characteristics derived from the conducted experiment, which included quality assessment of generated summaries using ROUGE-N metrics. The results of both actions showed a deficiency of tools allowing to automatically create summaries of Polish texts, especially in the abstractive approach. Most of the proposed solutions are based on the extractive method, which uses parts of the original text to create its abstract. There is also a shortage of tools generating one common summary of many text documents and specialized tools generating summaries of documents related to specific subject areas. Moreover, it is necessary to intensify works on creating the corpora of Polish-language text summaries, which the computer scientists could apply to evaluate their newly developed tools.

Keywords: text summarization, Natural Language Processing, text documents, Polish language processing, automation of knowledge acquisition

References

  • Al Qassem, L. M., Wang, D., Al Mahmoud, Z., Barada, H., Al-Rubaie, A. i Almoosa, N. I. (2017). Automatic Arabic summarization: A survey of methodologies and systems. Procedia Computer Science, 117, 10-18. https://doi.org/10.1016/j.procs.2017.10.088
  • Alguliyev, R. M., Aliguliyev, R. M., Isazade, N. R., Abdi, A. i Idris, N. (2019). COSUM: Text summarization based on clustering and optimization. Expert Systems, 36(1), e12340. https://doi.org/10.1111/exsy.12340
  • Al-Saleh, A. i Menai, M. E. B. (2018). Solving multi-document summarization as an orienteering problem. Algorithms, 11(7), 96. https://doi.org/10.3390/a11070096
  • Anand, D. i Wagh, R. (2019). Effective deep learning approaches for summarization of legal texts. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2019.11.015
  • Branny, E. i Gajęcki, M. (2005). Text summarizing in Polish. Computer Science, 7, 31-48.
  • Chetia, G. i Hazarika, G. C. (2019). Single document text summarization of a resource-poor language using an unsupervised technique. International Journal of Engineering and Advanced Technology, 9(1), 6278-6281. https://doi.org/10.35940/ijeat.a2250.109119
  • Ciura, M., Grund, D., Kulików, S., Suszczańska, N. i Okatan, A. (2004). A system to adapt techniques of text summarizing to Polish. In A. Ocatan (red.), Computational Intelligence (s. 117-120). Proceedings of the International Conference on Computational Intelligence. 17-19 grudnia, Istambuł, Turcja.
  • Dash, A., Shandilya, A., Biswas, A., Ghosh, K., Ghosh, S. i Chakraborty, A. (2019). Summarizing user-generated textual content: Motivation and methods for fairness in algorithmic summaries. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1-28. https://doi.org/10.1145/3359274
  • Dudczak, A. (2007). Zastosowanie wybranych metod eksploracji danych do tworzenia streszczeń tekstów prasowych dla języka polskiego (praca magisterska). Politechnika Poznańska. http://www.cs.put.poznan.pl/dweiss/research/lakon/publications/thesis.pdf
  • Dudczak, A., Stefanowski, J. i Weiss, D. (2008). Automatyczna selekcja zdań dla tekstów prasowych w języku polskim. Institute of Computing Science, Poznan University of Technology, Poland, Technical Report RA-03/08. http://www.cs.put.poznan.pl/dweiss/research/lakon/publications/techreport.pdf
  • Fejer, H. N. i Omar, N. (2015). Automatic multi-document Arabic text summarization using clustering and keyphrase extraction. Journal of Artificial Intelligence, 8(1), 1-9. https://doi.org/10.3923/JAI.2015.1.9
  • Fell, M., Cabrio, E., Gandon, F. i Giboin, A. (2019). Song lyrics summarization inspired by audio thumbnailing. Proceedings of International Conference Recent Advances in Natural Language Processing, RANLP (s. 328-337), 2-4 sierpnia, Warna, Bułgaria. https://doi.org/10.26615/978-954-452-056-4_038
  • García-Hernández, R. A. i Ledeneva, Y. (2013). Single extractive text summarization based on a genetic algorithm. W J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, J. S. Rodríguez i G. S. di Baja (Eds.), Pattern recognition (s. 374-383). 5th Mexican Conference, MCPR 2013. 26-29 czerwca, Berlin, Niemcy. Springer. https://doi.org/10.1007/978-3-642-38989-4_38
  • Glenc, P. (2020). Automatyzacja analizy cyfrowej komunikacji organizacji, W B. Filipczyk, B. i J. Gołuchowski (red.),Cyfrowa komunikacja organizacji (s. 108-125). Wydawnictwo Uniwersytetu Ekonomicznego w Katowicach.
  • Gramacki, J. i Gramacki, A. (2011). Automatyczne tworzenie podsumowań tekstów metodami algebraicznymi. Pomiary Automatyka Kontrola, 57(7), 751-755.
  • Jassem, K. i Pawluczuk, Ł. (2015). Automatic summarization of Polish news articles by sentence selection. W M. Ganzha, L. Maciaszek i M. Paprzycki (red.), Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS) (s. 337-341). 13-16 września, Łódź, Polska. https://doi.org/10.15439/2015f186
  • Kallimani, J. S., Srinivasa, K. G. i Reddy, B. E. (2012). Summarizing news paper articles: Experiments with ontology-based, customized, extractive text summary and word scoring. ybernetics and Information Technologies, 12(2), 34-50. https://doi.org/10.2478/cait-2012-0011
  • Kannaiya Raja, N., Bakala, N. i Suresh, S. (2019). NLP: Text summarization by frequency and sentence position methods. International Journal of Recent Technology and Engineering, 8(3), 3869-3872. https://doi.org/10.35940/ijrte.c5088.098319
  • Kopeć, M. (2015). Coreference-based content selection for automatic summarization of Polish news. W Selected problems in information technologies (s. 23-46). Information Technologies: Research and their Interdisciplinary Applications 2015. 22-24 października, Warszawa, Polska. ITRIA 2015. Conference Proceedings.
  • Kopeć, M. (2018). Summarization of Polish press articles using coreference (praca doktorska). Instytut Podstaw Informatyki Polskiej Akademii Nauk. http://zil.ipipan.waw.pl/MateuszKopec?action=AttachFile&do=view&target=m.kopec-phd-thesis.pdf
  • Kulików, S. (2003). Implementacja serwera analizy lingwistycznej dla systemu Theos - translatora tekstu na język migowy. Studia Informatica, 24(3), 171-178.
  • Kumar, Y. J. i Salim, N. (2012). Automatic multi document summarization approaches. Journal of Computer Science, 8(1), 133-140. https://doi.org/10.3844/JCSSP.2012.133.140
  • Kumar, Y. J., Goh, O. S., Basiron, H., Choon, N. H. i Suppiah, P. C. (2016). A review on automatic text summarization approaches. Journal of Computer Science, 12(4), 178-190. https://doi.org/10.3844/jcssp.2016.178.190
  • Liakos, K. G., Busato, P., Moshou, D., Pearson, S. i Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 2674. https://doi.org/10.3390/s18082674
  • Lin, C. (2004). ROUGE: A package for automatic evaluation of summaries. W M. Moens i S. Szpakowicz (red.), Text summarization branches out: Proceedings of the ACL-04Workshop (s. 74-81). 25-26 lipca, Barcelona, Hiszpania. https://www.aclweb.org/anthology/W04-1013.pdf
  • Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159-165. https://doi.org/10.1147/RD.22.0159
  • Maylawati, D. S., Kumar, Y. J., Kasmin, F. B. i Ramdhani, M. A. (2019). An idea based on sequential pattern mining and deep learning for text summarization. Journal of Physics: Conference Series, 1402(7), 077013. IOP Publishing. https://doi.org/10.1088/1742-6596/1402/7/077013
  • Moen, H., Peltonen, L. M., Heimonen, J., Airola, A., Pahikkala, T., Salakoski, T. i Salanterä, S. (2016). Comparison of automatic summarisation methods for clinical free text notes. Artificial Intelligence in Medicine, 67, 25-37. https://doi.org/10.1016/j.artmed.2016.01.003
  • Mohan, M. J., Sunitha, C., Ganesh, A. i Jaya, A. (2016). A study on ontology based abstractive summarization. Procedia Computer Science, 87, 32-37. https://doi.org/10.1016/J.PROCS.2016.05.122
  • Morid, M. A., Fiszman, M., Raja, K., Jonnalagadda, S. R. i Del Fiol, G. (2016). Classification of clinically useful sentences in clinical evidence resources. Journal of Biomedical Informatics, 60, 14-22. https://doi.org/10.1016/j.jbi.2016.01.003
  • Nandhini, K. i Balasundaram, S. R. (2013). Improving readability through extractive summarization for learners with reading difficulties. Egyptian Informatics Journal, 14(3), 195-204. https://doi.org/10.1016/J.EIJ.2013.09.001
  • Ogrodniczuk, M. i Kopeć, M. (2014). The Polish Summaries Corpus. W N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, i S. Piperidis, (red.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014 (s. 3712-3715). Rejkiawík, Islandia. European Language Resources Association (ELRA).
  • Oufaida, H., Nouali, O. i Blache, P. (2014). Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. Journal of King Saud University - Computer and Information Sciences, 26(4), 450-461. https://doi.org/10.1016/j.jksuci.2014.06.008
  • Ozimek, W. (2020). Automatic summary of texts in Polish (praca magisterska). Uniwersytet Jagielloński w Krakowie.
  • Pontes, E. L., Huet, S., Torres-Moreno, J. M. i Linhares, A. C. (2020). Compressive approaches for cross-language multi-document summarization. Data & Knowledge Engineering, 125, 101763. https://doi.org/10.1016/j.datak.2019.101763
  • Radev, D. R., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A. i Zhang, Z. (2004). MEAD - a platform for multidocument multilingual text summarization. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004). Lizbona, Portugalia. https://doi.org/10.7916/D8MG7XZT
  • Rajasekaran, A. i Varalakshmi, R. (2018). Review on automatic text summarization. International Journal of Engineering & Technology, 7(2.33), 456-460. https://doi.org/10.14419/IJET.V7I2.33.14210
  • Slamet, C., Atmadja, A. R.lawati, D. S., Lestari, R. S., Darmalaksana, W. i Ramdhani, M. A. (2018). Automated text summarization for Indonesian article using vector space model. IOP Conference Series: Materials Science and Engineering, 288, 012037. IOP Publishing. 24 sierpnia, Bandung, Indonezja. https://doi.org/10.1088/1757-899x/288/1/012037
  • Suszczańska, N. i Kulików, S. (2003). A Polish Document Summarizer. W Hamza, M. H., (red.), Applied Informatics (s. 369-374). Proceedings of the 21st IASTED International Multi-Conference on Applied Informatics. 10-13 lutego 2003, Innsbruck, Austria. IASTED/ACTA Press.
  • Swamy, A. i Srinath, S. (2019). Automated Kannada text summarization using sentence features. International Journal of Recent Technology and Engineering, 8(2), 470-474. https://doi.org/10.35940/ijrte.b1531.078219
  • Świetlicka, J. (2010). Metody maszynowego uczenia w automatycznym streszczaniu tekstów (praca magisterska). Uniwersytet Warszawski.
  • Xiang, X., Xu, G., Fu, X., Wei, Y., Jin, L. i Wang, L. (2018). Skeleton to abstraction: An attentive information extraction schema for enhancing the saliency of text summarization. Information, 9(9), 217. https://doi.org/10.3390/info9090217
  • Zhang, Y., Li, D., Wang, Y., Fang, Y. i Xiao, W. (2019). Abstract text summarization with a convolutional Seq2seq Model. Applied Sciences, 9(8), 1665. https://doi.org/10.3390/app9081665
  • Zhu, T. i Li, K. (2012). The similarity measure based on LDA for automatic summarization. Procedia Engineering, 29, 2944-2949. https://doi.org/10.1016/j.proeng.2012.01.419
  • Zhuang, H., Wang, C., Li, C., Li, Y., Wang, Q. i Zhou, X. (2018). Chinese language processing based on stroke representation and multidimensional representation. W IEEE Access, 6, 41928-41941. https://doi.org/10.1109/access.2018.2860058
AUTHOR

Piotr Glenc

About the article

DOI: https://doi.org/10.15219/em89.1513

The article is in the printed version on pages 67-77.

pdf read the article (Polish)

How to cite

Glenc, P. (2021). Narzędzia do automatycznego streszczania tekstów w języku polskim. Stan badań naukowych i prac wdrożeniowych. e-mentor, 2(89), 67-77. https://doi.org/10.15219/em89.1513