Xeuron

AI Metadata Extraction

Extract authors, key findings, references, and an executive summary using AI.

Version:

Extraction v4openai/gpt-4o-mini4/28/2026

Executive Summary

This research explores the application of reinforcement learning in automating robotic food packaging processes. It focuses on optimizing the conveyor belt speed to enhance productivity and quality in the face of variable product supplies. The proposed framework successfully utilizes real-world data to achieve a performance exceeding traditional methods, demonstrating improvements in product handling and operational efficiency. Significant reductions in product loss, acceleration, and computation times validate the efficacy of integrating reinforcement learning into complex industrial settings. Future directions include expanding real-world data usage in training and transferring solutions from simulation to physical robotic systems.

Authors

Eveline DrijverPrimary

Cognitive Robotics Department, Delft University of Technology, 2628 CD Delft, The Netherlands

E.A.Drijver@gmail.com

Rodrigo Perez-Dattari

Cognitive Robotics Department, Delft University of Technology, 2628 CD Delft, The Netherlands

R.J.PerezDattari@tudelft.nl

Jens Kober

Cognitive Robotics Department, Delft University of Technology, 2628 CD Delft, The Netherlands

J.Kober@tudelft.nl

Cosimo Della Santina

Cognitive Robotics Department, Delft University of Technology, 2628 CD Delft, The Netherlands

C.DellaSantina@tudelft.nl

Zlatan Ajanovic

Cognitive Robotics Department, Delft University of Technology, 2628 CD Delft, The Netherlands

Z.Ajanovic@tudelft.nl

Abstract

Intelligent manufacturing is becoming increasingly important due to the growing demand for maximizing productivity and flexibility while minimizing waste and lead times. This work investigates automated secondary robotic food packaging solutions that transfer food products from the conveyor belt into containers. A major problem in these solutions is varying product supply which can cause drastic productivity drops. Conventional rule-based approaches, used to address this issue, are often inadequate, leading to violation of the industry’s requirements. Reinforcement learning, on the other hand, has the potential of solving this problem by learning responsive and predictive policy, based on experience. However, it is challenging to utilize it in highly complex control schemes. In this paper, we propose a reinforcement learning framework, designed to optimize the conveyor belt speed while minimizing interference with the rest of the control system. When tested on real-world data, the framework exceeds the performance requirements (99.8% packed products) and maintains quality (100% filled boxes). Compared to the existing solution, our proposed framework improves productivity, has smoother control, and reduces computation time.

Key Findings (20)

1
Proposed methodology improves productivity by 0.63% compared to existing solutions.
2
Achieved 99.94% performance, substantially above the industry requirement of 99.8%.
3
Quality maintained at 100%, ensuring no empty or partly filled boxes leave the machine.
4
Lost products decreased by 93.26% under the new system.
5
Mean box belt acceleration reduced by 82.70%.
6
Computation time improved, showing a 55.05% decrease compared to the rule-based method.
7
Framework effectively tackles varying product supply with predictive learning.
8
Utilized a simulated environment to validate efficiency while ensuring real-world applicability.
9
Demonstrated improvements without violating any industry constraints.
10
Leveraged scenario randomization for robust training under realistic conditions.
11
Adapted policy shows increased generalization to varying inflow rates beyond training conditions.
12
Implemented control delays and planned delays for smoother operation.
13
Successfully integrated reinforcement learning into a complex existing control scheme.
14
Developed a continuous action space for precise control of the box belt speed.
15
Penalty functions were designed to encourage compliance with performance constraints.
16
Framework allows the optimization of robotic solutions across various scenarios without re-engineering.
17
Highlighted challenges in reinforcement learning for real-world applications.
18
Assessment included multiple validation scenarios to gauge effectiveness.
19
Indicated the need for further research on transferring the policy to physical machines.
20
Future work may involve enhanced data collection and validation strategies.

Discussion & Future Directions

The discussion emphasizes the successful integration of the proposed reinforcement learning framework into existing robotic packaging processes. It highlights the necessity for ongoing research to overcome challenges in real-world implementations and explains how the findings pave the way for enhanced automation in food packaging.

References (9)

[1]Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., & Tassa, Y. (2018). Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757.
[2]Dulac-Arnold, G., Levine, N., Mankowitz, D. J., Li, J., Paduraru, C., Gowal, S., & Hester, T. (2021). Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis. Machine Learning, 110(9), 2419–2468.
[3]Freund, R. M. (2004). Penalty and barrier methods for constrained optimization. Lecture Notes, Massachusetts Institute of Technology.
[4]García, J., & Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437–1480.
[5]Kang, K., Belkhale, S., Kahn, G., Abbeel, P., & Levine, S. (2019). Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 6008–6014).
[6]Liu, Y., Halev, A., & Liu, X. (2021). Policy learning with constraints in model-free reinforcement learning: A survey. In 30th International Joint Conference on Artificial Intelligence (IJCAI).
[7]Moriyama, T., De Magistris, G., Tatsubori, M., Pham, T. H., Münaawar, A., & Tachibana, R. (2018). Reinforcement learning testbed for power-consumption optimization. In Methods and Applications for Modeling and Simulation of Complex Systems. AsiaSim 2018. Springer (pp. 45–59).
[8]Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
[9]Wyrwa, J., & Barska, A. (2017). Innovations in the food packaging market: Active packaging. European Food Research and Technology, 243, 1681–1692.