Big data is a revolution of information technology that is not only affecting industries around the globe, it is today the most important tool to revolutionize science. Generating the data sets that can drive science forward are considered today amongst the most urgent goals in the scientific community.
As part of the Wellcome Leap Project First 1000 Days research initiative to promote healthy brain development for newborn children, a group of cognitive psychologists at the Princeton Neuroscience Institute (PNI) set out to create a naturalistic dataset on infants’ everyday development that deploys state-of-the-art machine learning tools.
Twenty families consented to participate in the study that continuously collects video and audio data on caregiver/child interactions over 1,000 days. The data will allow researchers to ask questions about child development in real-life that researchers could not previously address with standard small experimental data-sets.
Given this project is the first of its kind, the PNI researchers needed a DevOps partner to help them run the data pipeline and create the infrastructure that would allow them to monitor, QA and refine a pipeline that would ultimately move over 2,000 hours of HD data per day from families through the cloud.
Data is the lifeblood of a good scientific program, and so it was vital to collect, process and store the data streamed in from participant’s homes with absolute precision and accountability. The PNI researchers had the vision of what such huge amounts of data could unlock, but managing this amount of unstructured data should be left to specialists in Cloud and DevOps.
The Beaty Consultancy team helped to shape the logging and auditing methodologies used to assert that the data emerging from the ML pipeline was accurate, complete and stored securely. We helped the team to knit together the right services in Amazon Web Services (AWS) with the right configuration such that the flow of data was entirely event driven, and it was possible to audit that flow right from ingestion to cold-storage.
The PNI scientists and Beaty Consultancy Team also worked to ensure the data was kept safely and securely. We even worked with AWS’s own internal security team to have them audit our proposed solutions, such was the importance of getting this right first-time.
Due to the nature of cutting edge research, there was no blueprint for how the data pipeline should be operated. The PNI team envisioned the creation of a database that would serve generations of scientists to come, but had no possibility to test novel tools – neither for them nor for those future generations.
The Beaty Consultancy team reverse engineered the current pipeline and deployed everything again in a new development environment, exactly like the current production pipeline. This allowed the whole team to be more confident about changes we needed to make to the pipeline. We did this by scripting out the AWS estate as it was before the Beaty Consultancy team were engaged in the project, using Terraform Infrastructure as Code (IaC).
Science doesn’t stand still, so nor should the 1KD research pipeline. Therefore the Beaty Consultancy team worked closely with the PNI researchers to refine, update and deploy their new ML algorithms in a controlled and repeatable manner. We have a rich background in supporting production environments across healthcare, Oil and Gas, Cryptocurrency trading and Enterprise Asset Management, where the client or users cannot tolerate down-time or inaccuracy. We brought these skills with us to develop safe deployment and rollback mechanisms.
The Beaty Consultancy team worked across the various organizations involved in the 1KD project, such as Wellcome Leap, the non profit funding the research, the software development contractor, the PNI team and AWS themselves.
The first priorities which would span all these teams was to help the PNI researchers navigate an information security audit of the pipeline, and to find cost saving efficiencies in the pipeline to help meet budgetary requirements so the research could go ahead at the scale imagined.
As well as the technical aspects we bring to the project, we also brought our whole selves. The project will last 1000 days, and so personal professional relationships across the team are absolutely essential. The Beaty Consultancy team works predominantly in the UTC time zone, which is +5 hours from Princeton. But that doesn’t seem to matter, and when things need to be done. It is so important to not just be a techno-drone at the end of a Zoom call.
This is a key part of our culture here at Beaty Consultancy. We want to build long-lasting connections with our clients, and really help them achieve their goals in the long term – not just turn up and build some pretty architecture.
Our thanks go to the research team at PNI, especially to Liat Hasenfratz for helping us to get the story out.
If you’re dealing with critical infrastructure or data, and you need a safe and dependable team to help you manage it (and you like tea), why not give us a shout. Let’s build awesome!