A collaborative robot is a robot intended for direct human robot interaction within a shared workspace, or where humans and robots are in close proximity. For safety reasons, they are typically run at reduced speed with respect to traditional industrial robots. On the other hand, the persistent presence of the human worker might introduce several benefits in specific applications. The activation of safety mechanisms usually stops the motion of these robots or reduces their productivity, as well as the one of other machines and robots involved in the same production process. With the aim of improving their productivity with respect to these situations, we have proposed an online trajectory optimisation method for collaborative robots capable of reducing the number of interventions of safety functionalities.Continue reading “An IIoT solution to speed-up collaborative robotics applications”
How many times do we feel exhausted after an intense workday? Maybe we did our best at work and then we feel like our own physical and mental resources just disappeared… or we feel like we weren’t able to operate at maximum efficiency, inevitably collapsing into a heap of perceived failure.
In today’s society the modern work paradigm often seems to ask us to do more in less time, and do it well. However, this “fast-and-well” work culture intrinsically incorporates the elements that, in the long run, can lead us to be less productive, or undermine our psycho-physiological safety. This might not only entail serious productivity loss but also lead workers to burnout, overstress and absenteeism.
So, where can we buy the ticket for a workday filled with less stress and more throughput? Is it just utopia the idea of simultaneously improving both workers’ well-being (reducing stress in the workplace) and job performance? Although this may seem unrealistic, it’s not. Today the collaborative robot (a.k.a. cobot) can be a key ally to help the industrial shop-floor worker achieve both these objectives!
How? The key answer is enclosed in the worlds of game theory, strategic decision making and reinforcement learning. But let’s proceed one step at a time. According to the psycho-physiological theory, the relationship between stress and performance is described by an inverted U-shaped curve, as illustrated hereafter. This means that, while too little stress results in boredom and demotivation (calm) and seems not to boost our productivity, moderate stress levels are beneficial in terms of performance (eustress), but only up to a certain point. After that, the increase of stress has a negative impact on both our safety and performance (distress), leading to mental exhaustion and poor productive capabilities.
So, stress and performance are strongly interrelated and staying on top of the curve (where just the right amount of stress allows us to achieve the peak performance) is a delicate balancing act and –of course– a matter of trade-offs and equilibria!
Wouldn’t it be great if the cobot could help us reach this optimal stress-performance point? Here’s how to do that.
At Politecnico di Milano, we enabled the cobot to autonomously adjust its behaviour during the interaction with the human co-worker to simultaneously increase his/her performance and mitigate work-related stress.
The very first ingredient to let the cobot realize this strategy is to monitor in real-time the stress and the performance of the human worker. Then, to express the trade-off between the human stress minimization and performance maximization, we borrow some elements of game theory.
Indeed, game theory is the science that studies the optimal decision-making of competing and independent agents (a.k.a. players) in strategic frameworks. The core of game theory is the so-called ‘game’, which is used as a model of the interaction between rational players. Each player is then a strategic decision-maker within the situation exemplified by the game. Given the circumstances that can arise within the game, each player decides which action to take among a predefined set. The combination of simultaneous game actions played by the agents is called ‘game outcome’. This represents a possible state that the players can reach during their interaction. For each game outcome, each agent receives a payout (a.k.a. ‘payoff’ or ‘utility’), i.e. a numeric value quantifying the player’s level of satisfaction with respect to that outcome.
In human-robot collaborative frameworks, typically, a human operator and a cobot cooperate to accomplish a certain task. Clearly, the way they behave during cooperation influences the human attitudinal state in terms of stress and performance. Unfortunately, this influence cannot be predicted a priori.
So, game theory provides the theoretical framework to model how the actual human-robot interaction is affecting the stress and performance of the human worker. To realize the model of this trade-off, we reinterpret the interaction between human and robot as a game between two self-interested agents: H and R. These are used to express the two competing aspects of whatever working framework: be productive on one side, and keep stress low on the other side. As displayed in the following figure, we assume the goal of player R is to maximize productivity and the one of player H is to minimize stress. Then, we exploit the game actions to express the possible behaviours adopted by the human and the robot during the interaction. Indeed, we assume that each player can adapt to the other one (i.e. playing action D) when it is aligning to the goal of the other player or do not adapt (i.e. playing action ND), in the opposite case.
So, this game-theoretic model allows us to evaluate the admissible stress-performance compromises (outcomes) that the human-robot interaction can produce. Besides, it also let us identify the particular game outcome (Nash equilibrium) that represents the local equilibrium state of the game and a “no regrets” situation for the agents. This is due to the fact that, by definition, the Nash Equilibrium is a game outcome that, once reached, means that no player can increase its payoff by changing actions unilaterally. Thus, in the considered case, the Nash Equilibrium give us the chance to identify the local stress-performance equilibrium compromise.
But how to make the cobot apply the right behaviour to optimize this trade-off?
This time the answer is reinforcement learning. Basically, we penalize or reward the robot behaviour proportionally to its positive or negative influence on the human attitudinal state in terms of stress and performance. The game model serves to evaluate this effect. In our Lab we applied this approach and validated it in a real human-robot collaborative assembly task, as shown in the following video.
During the experimental campaign, we compared the effects of three different techniques in terms of human stress and productivity:
- the full strategy just described (CPS), where the cobot autonomously adjusted the production rhythm (a.k.a. pace of interaction) to jointly optimize both human stress and performance;
- a strategy (NC) where the robot simply followed the production rhythm decided by the human, without optimizing productivity or stress;
- a strategy (CP) where the robot varied the pace of interaction to solely stimulate the human productivity, regardless of his/her stress.
The results, illustrated hereafter, revealed that only the proposed strategy (CPS) was effective in simultaneously achieving the highest productivity level and a significant stress mitigation. Besides, they also showed that a suitable variation of the pace of interaction, applied by the cobot, is the key to realize the desired optimization.
- G. Weiss – Multiagent systems (ed.), The MIT Press, Cambridge, Massachusetts, 2013.
- C. Messeri, G. Masotti, A. M. Zanchettin, P. Rocco – “Human-Robot Collaboration: Optimizing Stress and Productivity Based on Game Theory”, IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8061-8068, Oct. 2021.
The word “cobot” denotes a robot optimised for the collaboration with humans. Traditional industrial robotics guarantees high efficiency and repeatability for mass production but it lacks flexibility to deal with the fast changes in the consumers’ demand. Humans, on the other hand, can face such uncertainties and variability but they are limited by their physical capabilities, in terms of repeatability, physical strength, endurance, speed etc. The human-robot collaboration is a productive balance that catches the benefits from both industrial automation and human work.
In traditional automation, decisions are frequently driven by PLC logics. In discrete manufacturing, the problem of how to choose when two or more possibilities are simultaneously available may arise.
Precedence rules are normally adopted. Sometimes, the definition of these rules is based on a priori knowledge of the system. Most often, they rely on intuition of the programmer who implements simple tie-breaking rules with no clear foundation in terms of optimality.
A static job scheduling determines which operations have to be executed by either the human or the robot in an a-priori way. This methodology can be useful when changes in the workplace are not observable, agent performance is not measurable and/or the system is observable and measurable but the agents are not controllable anymore once the task has begun.
But how about allowing robots to learn optimal decisions from experience? … moreover, what if the robot can learn by running what-if analyses within a digitalised environment (i.e. a digital twin)?
By collecting production data from the physical system, the digital twin can progressively tune its parameters so to fit the actual behaviour of the system.
Based on these parameters, simulations or what-if analyses can be run to predict the effects of decisions, and select which decision will actually provide the best outcome in terms of performance or productivity. The key idea is sketched in the following.
Reinforcement learning is the process adopted to learn what to do (the policy) based on the experience. While typical demonstration of such a method relies on an initial trial-and-error phase on the real system, the approach of learning on the digital replica of the system has obvious advantages, including a faster learning rate.
In our Lab we run a series of verification tests in a collaborative human-robot assembly scenario. The product to be assembled is the Domyos Shaker 500 ml produced by Decathlon. The robot and the operator collaborate in assembly six identical products. Job allocation and job sequencing problems are dynamically optimised.
Below, a video of the application.
- G. Fioravanti, D. Sartori, “Collaborative robot scheduling based on reinforcement learning in industrial assembly tasks”, MSc Thesis at Politecnico di Milano, 2020
- R. S. Sutton, A. G. Barto, “Reinforcement Learning: An Introduction”. MIT Press, 1998
How many times, at the grocery shop, while waiting for our turn to be served we naturally asked ourselves: should I just queue and wait or should I swing by another aisle in the meanwhile? Well… just a simple question that however entails quite a few reasoning: how fast are the attendants in serving other people? how much time do I have without loosing my turn?
Now, let’s virtually move this paradigm to the factory of the future. Our guest star is now a collaborative robot that has to decide when the human fellow co-worker will require its assistance.
This is what we are currently doing in our lab at Politecnico di Milano to allow the robot to answer the following questions:
- which activity is the human operator more likely to perform next?
- what is the time when an activity requiring my assistance is expected to be initiated by the human?
But let’s proceed step by step…
The first ingredient that we need is an efficient way to categorise human’s actions. For this goal, we used some Bayesian statistics and some computer vision. The result is shown in this video: the robot completes the operation initiated by the human.
Now we have correctly characterised what the human is doing, but what’s next? Well, based on some machine learning algorithms, and specifically based on pattern recognition, we were also able to predict the next sequence of actions and their duration. The result is shown in the next video: the robot is autonomously responsible for a quality control task, while the operator is involved in some assembly operations. As soon as the human completes his task, the collaborative phase can start and the robot is ready to help. The promptness of the robot is achieved thanks to the observation of previous executions and allows the robot itself to be ready when required by the human.
The algorithm has been compared to a purely reactive approach, during which the robot always starts its own task unless the human has already initiated the collaborative phase. The proactive behavior outperforms the purely reactive one by reducing the cycle time and its variability, hence achieving the perfect production leveling (or heijunka, 平準化, in Japanese).
- A.M. Zanchettin, P. Rocco – “Probabilistic inference of human arm reaching target for effective human-robot collaboration”, IROS 2017, Vancouver (Canada), September 24th – 28th, 2017.
- A.M. Zanchettin, A. Casalino, L. Piroddi, P. Rocco – “Prediction of human activity patterns for human-robot collaborative assembly tasks”, IEEE Transactions on Industrial Informatics.