Maria Malek’s interview, researcher in the ETIS-CNRS laboratory

Maria Malek is a researcher in the ETIS-CNRS laboratory, head of the Data Science option and teacher at Cytech. She gave us an interview on the extremely rich world of opinion mining.

In the framework of a school-company partnership, Consortia has established a special technical link with Cytech, proposing a subject for a final year project (PFE) focused on sentiment analysis. Supervised by Maria Malek and Consortia (for its expertise in text mining), this project constitutes a real exchange of knowledge and know-how that are essential in today’s world.

We wanted to give Maria the opportunity to share her experience and her vision on the use of NLP (Natural Language Processing) technologies.

The interview was conducted by a senior consultant within the Consortia Group who has been working for many years on subjects related to text mining, opinion mining and sentiment analysis as well as NLP techniques.

Consultant: Hello Maria, thank you for accepting this interview. First of all, could you briefly present your background?

Maria: I defended my thesis in computer science (1996), more specifically in the field of artificial intelligence, in collaboration with the University Hospital of Grenoble, on the design of a reasoning system by analogy with the medical field.

After my thesis, I did 2 years of post-doctorate at the Ecole des Mines de Paris and today, I am a researcher in the ETIS laboratory and a teacher at Cytech.

I was recently co-chair of the 10th edition of the MARAMI 2019 Conference- Models & Network Analysis: Mathematical & Computer Science Approaches.

Consultant: Sentiment analysis and opinion mining are often used synonymously. How would you define them more precisely?

Maria: Opinion mining tries to identify the opinions, feelings and attitudes present in a text or a set of texts (corpus).

Sentiment analysis is concerned with the orientation of an opinion in relation to an entity or an aspect of an entity (context). It is called polarity and can be positive or negative for example. It is particularly used in marketing to analyze the comments of Internet users or product reviews.

This type of analysis calls upon several approaches to automatic natural language processing (NLP). The simplest ones are those based on the detection of terms that directly explain an appreciation. In practice, we realize that an opinion extraction with only these explicit terms or words is not sufficient to ensure a satisfactory result. We therefore use Machine Learning or Deep Learning methods, where we design a supervised system from a labeled corpus.

For social network analysis, network theory can also be used to study social interactions. These interactions and relationships can be represented by a graph, where each node represents an actor and each link is a relationship. We can study the properties of the structure and its role as well as the position of each social actor – for example, identifying influencers and their impacts, or observing the propagation of an opinion.

Complex network analysis is applied in many other fields such as biology, to identify the enzymes of a metabolic network involved in a common process, social sciences, to correlate profiles by centers of interest, or in the field of anti-fraud, anti-terrorism, etc.

Consultant: What media can we use to do such analyses?

Maria: Twitter and LinkedIn are incredibly rich sources of data, both in terms of the volumes processed and the diversity of the data available, provided that the data is well adapted to the field under study.

To ensure relevant data collection, it is essential to define the “Why”, for which analysis needs, before the “How”, which techniques, which tools.

We are witnessing a phenomenon of hype, concerning opinion mining: all the professionals of marketing, intelligence and other fields want to propose a service of this type to their customers.

Consultant: In your opinion, can we automate everything? For example, can we have a reliable global vision of a brand’s e-reputation or can we have reliable leads for strategic actions?

Maria: Artificial intelligence is not a sum of information, but an added value to the information. More simply than before, we can make analyses to explain an opinion or characterize the mass of information (for example, the opinion is 30% negative and 70% positive) and have a vision. But for the information to really serve as a basis for a decision, we will always need the expert.

Consultant: So how do we get a more realistic view?

Maria: There are methods and algorithms to get an interpretation or explanation. However, it is important to apply them with care. Interpretation and explanation are often based on a subset of the data and a specific part of the data space, which increases the risk of misinterpretation. Some interpretation methods omit correlations between variables or offer only one counterfactual explanation when more than one could have been given.

It’s a matter of how many dimensions (at the data level) to consider in the algorithms: the information may be there but not accounted for, or there may be missing data.

Consultant: In your opinion, what are the perspectives of opinion mining in 1 year, in 5 years, and in 10 years?

Maria : In the past, algorithms were built specifically for dedicated systems (static systems). Nowadays, the improvement of acquisition techniques or the increase in the amount of data available in real time (daily incoming data flow for example), allow for a dynamic analysis. It is therefore difficult to choose an adequate algorithm for a given system.

The efficient operation of these systems requires the development of algorithms capable of adapting automatically. And as long as there are social networks, we will need to take into account this evolving aspect of interactions.

We can easily imagine the usefulness of this type of methods concerning the analysis of social networks related to a subject, or to a company, both for its brand and its products.

In addition, there are issues of language, semantics and context: is an algorithm capable of understanding the language of text messages, emoticons and/or of finding sufficient context in 140 characters? Can we do text mining in all languages?

Text mining must be combined with bias: systems cannot capture all opinions. For example, the content generated on blogs is very relevant, because it is generally more detailed, more expressive, but is it exhaustive? Does a product rating represent all opinions, both positive and negative?

Consultant: And finally, what are the skills required to work on this topic?

Maria: It’s a new subject, with a strong R&D component, applied in a very concrete way because it’s always at the service of a project.

It is essential to think carefully about the problem and to be able to adapt, to have an analytical mind. You need skills in “text mining”, Machine Learning/Deep Learning, social network analysis and then, of course, master different programming languages, for example “Python”.

Consultant: Thank you very much, Maria, for this interview!

Consortia would especially like to thank Maria Malek for her time and all the participants of the EFP, without whom this work would not have been possible.

As a researcher, Maria has published papers on the topic(s) of social network analysis that can be found at https://scholar.google.fr/