The problem:

OWL, which stands for online webpage layout observer, is a quality assurance tool. It detects changes in website layout that happen when new code is deployed. Some of these changes are wanted while others happen accidentally. Previously, this was done with a mere pixel-wise comparison of auto-generated screenshots of website parts before and after the code changes. While effective, this makes finding errors at the bottom of the page harder when errors at the top cause subsequent errors and most of the screenshot is marked as faulty. ​​​​​​​(Could you tell what's the error the left image at the bottom of this article and if there is one or more errors hidden in the pink?)

 

The first idea:

When we, the G+J AI Team, got approached with the task to develop a better and smarter solution for layout comparison, the initial idea was to work with some Neural Network- or Transformer-based layout extraction to extract website segments. However, these turned out to be mostly trained on scientific paper layouts and proved unusable for websites. Now there is the option to finetune a model like that, yet we don’t have the data for that. Furthermore, the computational time and cost for such a large model to process screenshots of up to 9 million pixels are very high and downsampling is not an option as too much detail would get lost in the process.

 

The solution:

This then posed the question of what other, simpler Computer Vision approaches there are to be considered, leading us to how OWL works. Using a simple Canny filter, OWL extracts edges of the “before” screenshot and uses these to create bounding boxes to split the layout into parts. The maximum size of these boxes dynamically adjusts to the image size, which is crucial, as we deal with images from 300k to 9mio pixels. These boxes are then mapped onto the “after” screenshot, using correlation filtering, to find differences within the box itself and in the position of the box. OWL detects shifts in layout and can distinguish between errors and the previous subsequent errors from the pixel-wise comparison. Finally, OWL gives textual and visual feedback on the type of error and its location, helping developers to decide whether the change is wanted or unwanted and then fixing the error. Not only does OWL have good eyesight like an owl, but it is also as fast due to the use of basic computer vision concepts. After all, you don’t need to take a sledgehammer to crack a nut, or the German bird-related version, you don’t need to shoot cannons on sparrows. Likewise, knowing AI is knowing when to use which degree of it. (In the left image generated by OWL you will see exactly where the changes are, in the pink boxes, shadowed with pink, and get the additional information that there is a small shift in all content afterwards, not visualised.)