
Michał Pogoda-Rosikoń
5
min read
May 15, 2025
At RDK Summit 2025, I joined Damian Danyłko and Artur Gębicz to present results of cooperation between Comcast and bards.ai on advancing Firebolt Connect (RDKM).

Firebolt now supports automated visual testing for set-top box applications. If you are not familiar, the process is simple:
Define how the UI should look on a golden sample by tagging key elements (buttons, images, texts etc.)
Automatically verify if they are correctly rendered on other hardware / firmware / software version configurations.
This tagging step - defining Points of Interest (POIs) - usually takes 5+ minutes per screen. With ~10 screens per app and thousands of apps, it adds up VERY quickly.

We evaluated off-the-shelf solutions - including Gemini (with native ability for bounding box detection) and Omniparser 1/2. Neither met the accuracy or reliability we needed. So we built a custom dataset and trained the model from the ground up.
It gave us full control and production-ready results:
90%+ accuracy.
10x reduction in asset preparation time.
It shows that custom models are far from dead - at least in computer vision. But in NLP? I’m not so sure anymore. Foundation models are catching up fast.
Looking to integrate AI into your product or project?
Get a Free consultation with our AI experts.