InsideArachnys – Q&A with Iain Rodger, Principal Data Scientist

Welcome to InsideArachnys, a series of interviews where we speak to the people behind the Arachnys platform. This month, we caught up with Iain Rodger, our Principal Data Scientist based in Scotland.

Hi Iain, can you tell us a little bit about yourself and explain your role at Arachnys?

I joined Arachnys at the tail end of 2020 as a data scientist, where I am primarily focused on complex problems related to search relevance. We have a truly vast amount of unstructured data to work with here and many potential applications around the business, so every day is interesting!

I hold a BSc Honours degree in Physics & Astronomy from The University of Glasgow and an Engineering Doctorate in Computer Vision & Artificial Intelligence from Heriot-Watt University, where I spent 4 years embedded in a company doing applied research.

Your career as a data scientist has covered various industries – not just banking and the wider financial sector – but film and cybersecurity also. What do you feel makes data science such a versatile job function within any organisation?

The ubiquity of data! The rise of technology and interconnectivity over the last decade has led to massive volumes of data generation and collection across almost all areas of society. It is unsurprising that the world is rapidly transitioning to a data centric economy as a result, where data is the primary asset to derive value from. This can be from insight-driven decision making to bringing a data focused product to market.

The interesting change in trends is how differently data is being leveraged now from traditional use cases. There are some really incredible applications out there, such as image and language generation, that seemed impossible even a few years ago. Broadly speaking the key requirements to properly collect, process and analyse data remains the same regardless of end product. In turn this creates a high demand for people who can work with data in all its various forms, as every industry wants to exploit their own data assets effectively.

One of the biggest factors that makes data science and various data professionals such a versatile job function, though, is the ability to work with unstructured data. Using tabulated, structured data has been routine for decades and will always remain a core part of many businesses. However, most data volumes are unstructured in nature and we are now seeing a large shift towards trying to unlock the underlying richness of text and video media, which has traditionally been difficult to work with. The increasingly easy access to data tools, compute resources and knowledge all play a part in realising this going forward and it is a very exciting time to work in this space.

Given your insights, have you seen an improvement in the adoption of machine learning and data handling throughout the years? Is the financial world up-to-date or lagging behind?

Yes, overall companies are doing a much better job of adopting ML into their operations and products than 5 years ago. It helps that the domain has matured massively in a short space of time, where the focus on production and deployment practices is very well received. The most important aspect for successfully adopting ML and data science led solutions is the culture of an organisation. If the key people at the top really believe in it and want to invoke change to become a data centric business, then it is much easier to achieve throughout the organisation. Working with great people definitely goes a long way too!

From my point of view I would say finance is generally lagging behind, but catching up quickly! In terms of retail, the most forward-looking and responsive area was in fraud and cybersecurity. Any help offered in the form of a new technology or ML approach to combat fincrime is always welcome. For example, I led a project to profile behaviours and elevate risk potential, based on a large variety of disparate signals from log files containing billions of events. The end results were excellent at identifying anomalous risk patterns to elevate subjects for human intervention. I find this particularly interesting given it was a few years ago and only now are we seeing the industry shift to client monitoring over transactional observation for risk management.

We were definitely ahead of the curve on that project and it is encouraging to see the industry move in this direction. I suspect that retail will always endure lag relative to smaller fintechs as it is much easier to change momentum as a newer, lean technology-led business versus established financial organisations.

What are you working on right now and what technologies/languages are involved?

I am working on improving search relevance for our adverse media offering, by reducing the overall noise present in search results. That means trying to reduce the amount of duplicated articles, ensuring correct entities are matched and that they have a real connection to adverse terms we care about. It is a very challenging, complex problem as it deals with huge volumes of data and the nuance contained within natural language.

I am primarily using Python and various Natural Language Processing packages to interpret the unstructured text data, such as SpaCy and NLTK. We store our media data in an Elasticsearch cluster, so access to the information is very easy within our ecosystem.

What advice would you give to data scientists who want to become more senior in their careers?

The best thing you can do to boost your career is work on communicating your work to stakeholders and leaders in an organisation. I have seen excellent results ultimately have zero impact due to poor communication and knowledge sharing when it mattered. If you can’t explain in simple terms why your proposed solution or change is going to benefit the company, then you will struggle to gain any traction as a senior practitioner. There seems to be an overwhelming amount of technically competent data scientists who struggle in this area, so you really stand out when you can effectively present information. From the flipside, when I was more junior my favourite leaders were always the ones who displayed high levels of ownership, which I have always tried to carry forward as I have progressed. It is a desirable trait I look for in senior colleagues.

And finally, who do you look to for inspiration? Can you recommend any great books/people to follow/TEDtalks/podcasts that you follow?

I tend to not read any technical books now – the internet is my goto! The world of ML / data science moves very quickly, so if I want to learn something I will either find a good blog to work through examples or a relevant published paper. Some companies have excellent technology blogs that I read regularly though, the Netflix tech blog is particularly good.

For data science I often find myself reading material from Isaac Faber, who is the Chief Data Scientist for the US Army. For more leadership / human type inspiration I am a massive fan of Professor Bob Sutton’s work, who is a proponent of evidence led management and generally changing organisations to work better for people. Other than that I am a voracious reader of Sci-Fi books, the most recent of which reframed my worldview quite a bit. I would highly recommend The Three-Body Problem by Liu Cixin if you want to experience a bit of an existential moment!

Digital Marketing Executive at Arachnys