Challenges
In spite of the development of a software suite aiming at facilitating and homogenizing the data processing, the users rely in parallel on a great variety of related tools for the realization of their works.
It was necessary to rework the scripts by relying on existing tools, to re-educate on good practices and to set up the appropriate development ecosystem for the users.
The technical constraints of the project concerned both the variety of data formats and their volume in the very diverse technical environment in R (version > 3) and/or Python (version > 3) (data processing, access to databases and APIs in R, data visualization, publication in R, development of packages in R and modules in Python, code parallelization, development of APIs, web applications with R, transverse development tools (Git, Continuous Integration, …)
Project management, coaching and adhoc development
Consortia provides specific support for the success of the project. The team includes the know-how of a Data PMO to supervise the work of the Data Scientists team and ensure sharing and transparency on the work done.
Framing and support
- Conducting technical interviews with users, collecting material, sharing interim and final reports
- Drafting of the minutes of the interviews
- Drafting of the organization documentation
- Drafting of communication materials on the evolution of the scripts and their use in accordance with good practices
- Organized, enriched and updated sharing spaces
- Maintenance of an activity reporting table
Development of the scripts
- Clean up of user scripts while respecting good practices and ensuring compatibility with the latest version of Antares
- Breakdown of important functions into simpler sub-functions
- Development of unit tests of the functions developed in the entities
- Uniformity of file names, scripts and objects
- Deposit of the code in the Gitlab “DEVIN
- Documentation of functions and common data formats that could be grouped in R packages
- Long-term integration of functional requirements
200
restructured R scripts
Steering, implementation in support research
Rewriting of scripts
Starting from heavy and not very maintainable scripts, rewriting for a modular writing, standardization of denominations, implementation of unit tests
Use of the client's code repository
Despite an initial situation marked by a lack of capitalization and sharing, construction of referenced and secured scripts through the use of GitLab
Implementation of the ecosystem
Establishment of an adapted ecosystem, allowing for easier development thanks to versioning and joint work
Change management
- Presentation of the results of the script rewriting work and reminder of good practices
- Proposal of tools and organization of the user community in order to inscribe the process in time
- Management of the documentation sharing spaces, being a force of proposal
- Raising awareness of users so that they now respect good development practices