Artificial Intelligence for Science
🔹How can we help scientists write and communicate their research better? What is the difference between well and poorly written papers?
🔹 What makes an easy-to-read and logically written paper? What are the underlying linguistic patterns of well-written papers?
🔹 How can we apply automatization to aid academic publishers in making the review process more efficient and quicker?
🔹 How can we make the review process of scientific texts more objective? Can papers be evaluated based on quantified factors of writing quality?
To solve these investigative questions, we are applying state-of-the-art machine learning techniques, applied linguistic research, and expert knowledge on scientific writing to develop new models, functions, and algorithms.
We seek to comprehensively aid researchers during the entire writing process. This goal will be achieved through our applied research, development, and innovation (R+D+i), merging the latest technological advances with established writing guidelines. Our R+D+i is manifested in WriteWise, a unique software that will modernize scientific writing by reducing the time and effort required by researchers when writing and by journals/academic publishers when reviewing manuscript submissions.
for Natural Language Processing
Applied to Scientific Writing
We combine machine learning and computational linguistics within the framework of natural language processing, as applied to modelling and revising the writing process and scientific texts. This line of research applies the following methodologies:
1. Novel approaches for representing textual data from scientific articles:
- Word embeddings combined with deep/machine learning models for natural language processing tasks.
- Graph-based representations
2. Novel computational approaches for analyzing scientific articles, with specific investigative focus on:
- Discourse Segmentation
- Automatic Punctuation Analysis
- Rule-based Text Mining
- Topic Modelling
- Readability/Coherence Classification
Applied to Scientific Articles
We use functional and applied discursive frameworks, combined with corpus analysis, computational linguistics, and natural language processing approaches, to empirically determine the discursive and linguistics norms and requirements of academic and scientific texts. This line of research seeks to identify and comprehend the:
1. Communicative purposes and lexical-grammar features that constitute written texts in distinct scientific disciplines.
2. Textual and discursive foundations of academic and scientific texts.
A novel machine learning model that guides graduate students to write more organized and structured texts
Javier Vera, Hector Allende-Cid, René Venegas, Sebastián Rodríguez, Wenceslao Palma, Sofía Zamora, Fernando Lillo, Humberto González, Ashley Van Cott, and Eduardo N. Fuentes. 2018. Molecular Biology of the Cell, 29:26.
Academic writing is one of the most valuable skills a scientist can develop. A primary challenge for graduate students is to coherently and concisely organize and present ideas within a manuscript. Writing a quality research manuscript requires transmitting the most relevant information through precise sentences that fulfill diverse communicational roles, ultimately resulting in a coherent, understandable text connected by cohesive mechanisms (e.g. lexical relationships between pairs of terms). Despite technological advances, the execution and teaching of the writing process have not similarly advanced. Therefore, a top priority for graduate programs is to implement new methodologies and technologies that aid students in communicating research advances. Through our investigation, we developed a novel, unsupervised machine-learning model applied to cell biology and biomedical texts that guides students in writing better organized and more structured texts.
Revealing the collaborative dynamics of a large-scale arXiv text collection by means of k-shell decomposition
Javier Vera, Wenceslao Palma, Hector Allende, Sebastian Rodriguez, Juan Pavez, and Eduardo Fuentes. 2019. NetSci-X: International Conference on Network Science.
In this work was shown how k − shell decomposition helps to understand the dynamics of the formation of the decentralized and collaborative language community defined by the electronic repository arXiv. Our results suggest that there are several global patterns that emerges from the microscopic activity of users sharing content. The growth of the collection of texts (and therefore of the associated networks) was (almost) completely governed by the outmost k −shells, which exponentially increased its size over time. Nevertheless, the size of the most dense set of nodes (Skmax ) tends to linearly increase its size. This points in the direction of the existence of an exponential accumulation of words that forces changes in the main discipline (computer science, in our case), represented by Skmax . These observations were confirmed by the behavior of the (normalized) critical index k∗ = arg maxk |Sk |, since it exponentially shifts to the outmost network layers. Further study should describe the relationship between the index k and the number of connected components of the k − shell Sk . Moreover, it is plausible to propose that the decentralized features of arXiv appear precisely at those external layers.
Sentence encoders as a method for helping users identify and improve semantic similarity in bio-medical text
Brayn Díaz, Juan Pavez, Sebastian Rodríguez, Wenceslao Palma, Hector Allende-Cid, Rene Venegas, and Eduardo N. Fuentes. 2019. 5th Workshop on Automatic Text and Corpus Processing.
We demonstrated the effectiveness of both the USE and BioSentVec as methods for helping users identify and improve semantic similarity between sentences in bio-medical texts. The shared tendencies between the models support sequential similarity as a metric to evaluate a text’s cohesion. With both methods outliers can be easily spotted, and then specific modifications in the sentences can be carried out depending on the type of outlier.
WriteWise: software that guides scientific writing
Eduardo N. Fuentes, Hector Allende-Cid, Sebastián Rodríguez, Rene Venegas, Juan Pavez, Wenceslao Palma, Ismael Figueroa, Sofia Zamora, Brayn Diaz, and Ashley VanCott. 2019. 5th Workshop on Automatic Text and Corpus Processing.
WriteWise represents the first commercially available advanced platform that provides user's help and feedback to improve scientific papers writing. This is thanks to the development of and advance textual data representation at
different linguistic levels (e.g. words, sentences) through using cutting-edge machine-learning models and applied linguistics research.