BeBy Cloud: Smart PDF Summarization on Resource-Constrained Devices

The collaboration centred on a concrete technical question: can language models for text summarization run effectively on resource-constrained edge devices? The use case was the automatic summarization of PDF documents using pre-trained language models, deployed within a Kubernetes environment.

Project name: Testing language models on devices with constrained performance

Project’s period: 26/09/2024 – 15/01/2025

Partner: BeBy Cloud / Share thinking

The collaboration centred on a concrete technical question: can language models for text summarization run effectively on resource-constrained edge devices? The use case was the automatic summarization of PDF documents using pre-trained language models, deployed within a Kubernetes environment. The goal was to identify which models and deployment strategies are suitable under limited hardware conditions, with particular attention to the balance between performance, memory usage, and accuracy.

HOW WE APPROACHED IT


We tested multiple pre-trained summarization models with varying sizes and performance characteristics. The evaluation covered three main areas:

🔹Hardware benchmarking: Models were tested on BeBy – minimalistic network computer based on RPi Compute Modules CM4  and compared against standard laptops and GPU-enabled devices, measuring how performance and memory usage differ across these configurations.

🔹Deployment approaches: Three deployment methods were evaluated: native execution, Docker containerisation, and Kubernetes-based deployment, each assessed for its impact on performance under limited hardware conditions.

🔹Model optimization: Quantization was explored as a technique to reduce model size and memory requirements, with its effect on the balance between performance, memory usage, and accuracy evaluated across the tested models.

A knowledge transfer workshop was also organised, covering language models, deployment options, and optimization strategies, to share the findings and technical context with the partner’s team.

“For me, the most valuable outcome of this collaboration was showing that AI deployment on constrained hardware is possible, but it requires balancing ambition with technical reality. By testing models directly in Kubernetes and on edge-like devices, we were able to identify what is feasible today and what needs further optimization for practical use.”

ALEXANDER BRECKO
Research Engineer in NLP team

WHAT WE DELIVERED

As part of the project, we developed a FastAPI backend and deployed it to the partner’s on-premise Kubernetes environment hosted by BeBy.cloud platform. The application allows users to upload PDF files and select a summarization model, demonstrating real-world usage of the evaluated models. The collaboration resulted in a validated technical setup, performance benchmarks, a running demonstration service, and practical recommendations for future improvements.

IN THE PARTNER´S OWN WORDS

“Collaboration with KInIT has confirmed that our low-energy beby.cloud platform is fully ready for the era of Artificial Intelligence. On an 11-node cluster with total system power consumption lower than a single 50W light bulb, we achieved what was previously the domain of expensive cloud solutions. We believe that the simplicity of our approach will allow students to better understand these complex technologies and use them in their projects.

ROBERT GALIK
CEO / Founder of BeBy.cloud

Other news

Join us

Register for Hopero

Sign up to Hopero and tell us about your current situation by filling in a short questionnaire. We will prepare a tailor-made Hopero service offer for you.