BMVC 2024 – What’s hot and what’s not!

This week (25-28.11.2024) I attended the British machine vision conference. For the busy people, here’s a one word summary: CLIP. If you have a bit more time, below is a slightly longer description of my experience.

The conference covered many topics but the biggest focus was on combining image and large language models. CLIP is a big hit and mixing image and word embeddings seems to have many applications from generating new data and image search (ATLANTIS-Inderjeet Singh) to 3D scene applications (Can CLIP help CLIP in learning 3D? – Cristian Sbrolli). 3D reconstruction had a good amount of coverage as well, here Gaussian splatting takes the scene (HDRSplat – Shreyas Singh). In terms of real world applications the most common themes were every day objects (open world understanding – for example indoor scenes, objects around roads), medical images (for surgeries and diagnostics) and human pose and expression estimations.

The conference organisers chose somewhat less traditional schedule giving priority to posters rather than talks. There were two poster sessions per day, each lasting almost 2 hours. However, this meant that the posters rotated quite quickly and were displayed only for the duration of the session.

Apart from the workshop for Earth observations, environment topics were not really present in the conference except a single poster on segmenting green spaces.

Although no specific focus was given to underwater environment/ imagery I was able to find a few interesting projects that could be helpful in my research. During first poster session, I found project focused on image enhancement that featured an underwater image as well. It would be great to test this method against Waternet or Seathru.

There were posters presenting various de-noising methods for example Self-Supervised Real-World Denoising by Jointly Learning Visible and Invisible Noise where the authors used smartphones and GoPro cameras to obtain the imagery. Even though they did not test backscatter removal it is worth keeping this method in mind.

Another interesting denoising method focus on removing rain from scenes. This could be reminiscent of backscatter as raindrops are fairly large noise in images.

Multiple presentations made use of point clouds for object segmentation. Remco Royen demonstrated multiple aspects of processing point clouds with deep learning. Yuyang Zhao presented work on indoor point clouds. Although both of these presentations focused on environments different from my research topic it was great to see methods that can be used to deal with point clouds.

From the keynote talks, my favourite one was by Federico Tombari on The 3D Revolution: Neural Representations and Diffusion Models to Understand and Synthesise the 3D World. As I have 3D models available in my research as well I found this talk was a great overview on different existing methods. This presentation touched on better 3D object reconstruction from images, 3D scene understanding and also 3D instance segmentation (very relevant!).

Workshops

Majority of workshops took place on the last day of the conference. These once again covered wide variety of topics. Even though my main focus is use of computer vision in environmental settings I am quite interested in AI safety. I think it is a very complex area to navigate, especially when the data is scarce and the field is developing rapidly. As my first workshop I chose:

Media authenticity in the age of artificial intelligence

This workshop had multiple presenters talking about standards in image source verification. Invited speakers included Leonard Rosenthol presenting C2PA, Touradj Ebrahimi explaining JPEG-Trust and Andrew Tewkesbury from Airbus discussing whether satellite imagery is under the threat from AI manipulation (it’s not for now!).

The second part of the work shop was focused on using AI models to detect parts of images that have been manipulated by generative AI. For example FUSION++ (L Aljuaid, D Bhowmik) or WSWCBnet (Z Wang, C Abhayaratne).

Overall the speakers demonstrated the challenges of anticipating where generative AI will go and how we can deal with any potential issues content generation could bring. It is frankly quite surreal that we now need AI to deal with AI.

Workshop on Machine Vision for Earth Observation and Environment Monitoring

I finished the conference by attending the workshop on Earth observation. It opened with Sylvain Lobry as a keynote speaker showcasing what this field has achieved so far and the variety of challenges that computer vision encounters when applied to environment monitoring. The projects included cauliflower monitoring, linking demographic studies with environment and using satellite imagery to ask questions (RSVQA). This is fascinating range of data, each with different set of problems. Sylvain then revealed each of the projects used parts of the same model and posed a question whether we could have one large foundation model to cover all use cases. For now, however, the crowd was mostly convinced that we need multiple models for various tasks rather than just a single multi-purpose one.

There were 4 talks selected from the paper submissions:

MALPOLON: A Framework for Deep Species Distribution Modeling – Theo Larcher
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks? – Jose Sosa
Automated trash screen blockage segmentation using deep learning – Remy Vandaele
Predicting Socio-economic Indicator Variations with Satellite Image Time Series and Transformer – Robin Jarry

From these, the one closest to my project was surprisingly the trash screen blockage segmentation as it also deals with pixel wise segmentation and mass estimation. The goal of this project was to automatically detect when the trash screens get blocked and need to be cleaned to prevent flooding.

The workshop finished with presentation from Claudia Paris. She demonstrated how we can combine data from different sources: satellite images, street map images, and even utilise citizen science to obtain most up to date data. There are already some very useful datasets such as LUCAS database and Sen4Map. It was quite reassuring to hear that one of the most common problems in earth observation tasks is generalisation.

I think overall this workshop was extremely useful and it gave me many ideas where my research could go and how to better use the data I have available for example through image fusion.

Honourable mentions:

A couple of projects I thought were very cool and creative even if not related to my topic:

Style and Speech in Facial Animation – Jack Saunders: this is a PhD research about transferring styles such as emotions in facial animation. I thought it was really well presented and engaging even for someone without a background in this topic.
Drawing Insights: Sequential Representation Learning in Comics: this project was about understanding comics and being able to fill in missing parts.

And that’s it! These were incredibly busy 4 days filled with nonstop content so a lot to take in. I think the conference provided a very nice overview of the current state of the art and I was able to find some great ideas that could help me move my project forward.