The world is drowning in a sea of online textual data that requires an automated method for extracting a summary from a text file – such as an article or an interview – for further processing. This, combined with less available time to evaluate the vast amount of published text, raise the need for an automated technique to extract summaries from written texts.

Most of the solutions that are available today depend on specific languages and make it necessary to train the algorithms on large volumes of text. But now, BGN Technologies – the technology-transfer company of Ben-Gurion University of the Negev in Beersheba – has developed a novel, automated tool summarizing text whatever the language in which it’s written. The method is applicable for the extraction of articles, magazines and databases within the media itself and by users of such media including libraries, academic research engines and general search engines.

The novel technology was invented by Prof. Mark Last, Dr. Marina Litvak, and Dr. Menahem Friedman at the department of software and information systems engineering at the university. It provides language-independent summaries of texts, based on a “genetic algorithm” that ranks document sentences, using statistical sentence features that can be calculated for sentences in any language; then it extracts top-ranking sentences into a summary.

The method, called MUSE – Multilingual Sentence Extractor – was tested on nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish. Its summarization quality, which was evaluated on four languages – English, Hebrew, Arabic, and Persian – showed a high level of similarity to summaries written by human readers.

Experimental results show that after initial training of the algorithms on an annotated collection of summarized documents in which each document is accompanied by several human-generated summaries, the software doesn’t need to be retrained on a summarization corpus in each new language, and the same sentence-ranking model can be used across several languages.

Last explained: “Extractive summarization, which selects a subset of the most relevant sentences from a source text by ranking them using a relevance score and selecting the top-ranking sentences into a summary, is invaluable for being able to quickly summarize large quantities of text in a language-independent manner. This ability is crucial for search engines as well as other end-users, such as researchers, libraries and the media.”

Zafrir Levy, the senior vice president for business development at BGN Technologies, added, “This tool will be a valuable addition to our ability to benefit from the vast amounts of text available online. After filing a patent to protect the technology, we are currently looking for potential partners for further development and commercialization of this promising invention.”

BGN Technologies brings technological innovations from the lab to the market and fosters research collaborations and entrepreneurship among researchers and students. So far, it has established more than 100 startup companies in the fields of biotech, hi-tech and cleantech as well as initiating leading technology hubs, incubators, and accelerators. In the last 10 years, it has focused on creating long-term partnerships with multinational corporations such as Deutsche Telekom, Dell-EMC, IBM, PayPal, and Bayer, securing value and growth for Beersheba’s university as well as for the whole Negev region.

Source: Israel in the News