2022.07.20

Interview Series: Olga Domanova

By Jesua Epequin

 

The French Tech Beijing team sat with Olga Domanova, a French data scientist to get her insights on AI, China and Data Science applications.

01. What was your background before relocating to China?

My background is a mixture of informatics and biology, I have focused a big part of my career in machine vision: teaching machines to capture and interpret information from image and video data.

AI is an exciting area. « Data scientist » was not a mainstream position when I began my career. Becoming one was a gradual process, a natural transition from my background in life science informatics.

I studied my Bachelor’s in Biotechnology at the Saint Petersburg State Technical University. In 2007 I relocated to Bonn where I obtained a Master’s degree in Life Science Informatics at the Rheinische Friedrich-Wilhelms Universität. This was followed by a Ph.D. in Life Science Informatics at the RWTH Aachen University.

During my graduate studies I worked in parallel for the Fraunhofer Institutes where I was involved in cutting-edge AI applications. The Fraunhofer is the biggest organization for applied sciences in Europe, with 75 institutes spread throughout Germany, specializing on different fields of applied science. Fraunhofer’s inventions are countless, one of its best-known developments is the MP3 audio data compression process.

Afterwards, I spent some time with the French giant l’Oreal, then joined Notocord for a period of three years in Paris. This is a pharmaceutical company (currently owned by Instem) that focuses on the acquisition, display, and analysis of physiological signals. Its key product (Notocord-hem) is used in a wide array of settings, from pharmaceutical companies to hospitals, including research institutes.

 

02. What inspired you to come  work as a data scientist in China?

I relocated to China almost 2 years ago in order to work for Schlumberger as an AI Engineer.

The choice of China is due to its culture and long history. I was also attracted by the sharp professional contrast with the other countries I lived in.

 

03. What are the main differences between doing data science in China and France? In your opinion, which one is leading in artificial intelligence?

The main difference is the speed of work. Here, it is necessary to sometimes work on the weekends, or longer hours during the week. There are however more opportunities in China, especially because it is undoubtedly ahead of France in the artificial intelligence field.

One reason China is ahead is the mentality of Chinese entrepreneurs; they are willing to make any sacrifice to succeed, this means harsh competition and long working hours. Also, thanks to the widespread use of phone apps, there is a wealth of data available in China that is hard to match.

 

 04. What do you think are some promising applications of AI (in particular, image processing) yet to be implemented in China?

An interesting use of AI could be remote diagnosis. Using pictures taken by the patient offers the benefit of receiving advanced medical care without having to travel long distances, which is specially useful for elderly patients and people in rural areas.

However, this system would increase the burden on specialists by expanding the doctor’s geographic coverage. As a solution to that problem, deep learning-based artificial intelligence is expected to reduce the burden on doctors.

Remote diagnosis is already being used in the Republic of Congo; a venture company from Keio University School of Medicine has developed Smart Eye Camera (SEC), an ophthalmic medical device that can be attached to smartphones. An ophthalmologist from Japan examines an ophthalmological image taken using the SEC’s dedicated app. As a result, ophthalmic diseases were found in about 10% of the patients. These were later treated by a local ophthalmologist.

 

05. What was the focus of your PhD research?

During my Ph.D. I mainly focused on medical image processing. Under the supervision of Prof. Thomas Berlage from Fraunhofer FIT, I developed automated quantitative methods to analyze the mechanisms of liver diseases, thus helping identify different kinds of pathologies, such as cholestasis (a decrease in bile flow from the liver).

This software quantified the distribution of biomolecules in images of sections of the liver, it would then bring this into relation with liver diseases. To develop this tool I needed to be in constant communication with medical experts, in order to translate their knowledge into a preprocessing pipeline. Results from this AI model are an impressive 95% accuracy.

This tool was a major contribution to translational research (research aimed at translating results in basic research into results that directly benefit humans), and was favorably evaluated by the DFG (German Research Foundation).

 

06. What is ChemoCR?

ChemoCR is a software that automatically recognizes chemical structure depictions and, using chemical knowledge, translates them back into machine readable chemical representation format. In machine learning, such systems are known as rule based expert systems: an artificial intelligence that uses prescribed knowledge-based rules to solve a problem. The aim of an expert system is to take knowledge from a human expert and convert this into a number of hardcoded rules to apply to the input data. The project was developed at SCAI-Fraunhofer Institute.

The majority of chemical structure information in the literature is present as two-dimensional graphical representations. This presents a problem for computers. For instance, the two images below are depictions of azithromycin, they are drawn using different tools. Before ChemoCR, a computer could not perceive this equivalence from the picture itself.

After the conversion process ChemoCR would match both these depictions to the following label:

CN1C(C(C(C)(C(OC(C(C(C(C(C(CC(C1)C)(C)O)OC1C(O)C(N(C)C)CC(C)O1)C)OC1CC(OC)(C)C(O)C(C)O1)C)=O)CC)O)O)C

Translating graphical representations into a machine readable format is specially useful for patents: ChemoCR can be used to ensure that the chemical depiction of a « new » product that has not yet been covered by a patent. It can also be used to retrieve information from a chemical database for molecules of which only a drawing is available.

Some of the challenges found during this project were the low quality of the data, and its insufficient amount. Another big issue was that the labels for the dataset needed to be generated by hand. This is a time-consuming but important process: labels provide the training examples so that a machine learning model can learn to do predictions.

 

07. How was your experience working for l’Oréal? What kind of projects were you involved in?

My experience at l’Oréal can be summarized in two words: « very surprising ». I did not know so much technology was put into the manufacturing and testing of cosmetics.

An example is one of the major projects during my time with them: mascara testing. Before the arrival of artificial intelligence, mascaras would be tested on human models and quality would be assessed by a human judge. Currently they are applied by robotic arms on synthetic eyelashes and then analyzed by computer. Assessment by a human judge can be subjective, different experts can provide divergent opinions about the same product. Introducing artificial intelligence allows for an objective evaluation.

I was part of the development of the software used in the quality inspection. Image processing was a major challenge in this project, in order to develop the processing pipeline I needed to have continuous exchanges with experts.

08. Can you tell us more about Schlumberger? How does it use AI? What kind of projects are you involved in?

Schlumberger is a world leading provider of digital solutions for oil and gas industries. The company has French roots, it was founded in Alsace almost a century ago.

Artificial intelligence is employed at different stages of the drilling process, from finding petrol sources to optimizing the efficiency of the process. Finding new sources is achieved by investigating subsurface geophysical data. This allows not only accurately map underground oil deposits but also assess the value of the reservoir.

Concerning the drilling process optimization, we have developed artificial intelligence systems to improve the efficiency of directional drilling. This is a complex process involving the remote control of alignment and force application to a very long drill string subject to variable external forces. Traditionally, preserving proper trajectory and eliminating deviations is a task performed by expert directional drillers. However, taking into account the great depths drills reach, there is a delay in transmission between the head of the drill and the operator. This makes the process slower and adds latency, increasing risks and cost. Our artificial intelligence system ingests historical and simulation data corresponding to the information used and actions taken by expert directional drillers and uses that data to generate decisions that result in automatic direction correction.

 

09. What are some of the challenges you face in your work as a data scientist?

The main challenge is the dependence on the data, particularly its quality. This includes having enough correctly labeled samples.

Domain expert advice is also crucial. Especially for feature engineering: this is the process of using domain knowledge to extract features (characteristics, properties) from raw data. The motivation is to use these extra features to improve the performance of the machine learning algorithm, compared with supplying only the raw data. Feature engineering allows to uncover the main data attributes driving prediction.

10. How is data science used to approach Covid?

Future for Care is a European accelerator supporting health-focused startups. Exscientia is a startup whose purpose is to accelerate the development of drugs, they managed to finalize the production of a Covid drug in only 1 year. This is a stunning gain of time, considering the process can usually take up to 5 years.

 

Credits: Jesua Epequin (text), Caline Chong (article), Nil Larom (editor), Hugo Menzer & Goncalves Alexandre digital-space.cn (illustrations)