Content: Thirteenth Workshop

Thirteenth Workshop

The thirteenth and final Hybris Workshop will take place on Jun 20-21, 2019 at the EUREF Campus in Berlin.

Workshop Location

The workshop room will be next to the restaurant  "Grüns" (south end of "EUREF-Platz").

If you come by car/taxi, note that "Torgauer Str" can only be reached via Sachsendamm/ Dominicusstr.

Mercure Hotel Berlin Mitte


Thursday, June 20

12:00-13:00Welcome and Lunch
13:00-14:00Malte Helmert, Uni Basel
Declarative Heuristics
14:00-15:30Project A1
Verification of Non-Terminating Action Programs
15:30-16:00Coffee Break
16:00-17:30Project A2
Advanced Solving Technology for Dynamic and Reactive Applications
17:45-19:00Hybris PI Meeting/Discussion of followup Projects

Friday, June 21

09:00-10:00Sheila McIlraith, University of Toronto
Reward Machines: Structuring reward function specifications and reducing sample complexity in reinforcement learning
10:00-10:30Coffee Break
10:30-11:00David Speck, Uni Freiburg
An Analysis of the Probabilistic Track of the IPC 2018
11:00-12:30Project A3
Probabilistic Description Logics Based on the Aggregating Semantics and the Principle of Maximum Entropy
13:30-15:00Project C1
Planning and Action Control for Robots in Human Environments
15:00-16:00Final Discussion and Coffee


Malte Helmert: Declarative Heuristics

Abstract: The last two decades have seen significant advances in domain-independent planning. Besides improved scalability through better planning algorithms, several breakthroughs have been made in the theoretical understanding of classical planning heuristics. This talk presents a declarative perspective on heuristic design: telling the computer which information we want a planning algorithm to exploit rather than how to exploit it. Declarative heuristics offer a unified view of many approaches to heuristic design, allow aggregating information from different heuristics in a principled way and offer a path towards fully automated heuristic synthesis.

Bio: Malte Helmert is an associate professor for computer science at the University of Basel, Switzerland. His main research interests are in classical planning and heuristic search, with an emphasis on domain-independent algorithms for synthesizing distance heuristics in factored state spaces. His research group at the University of Basel leads the development of the Fast Downward planning system. He is a long-standing member of the ICAPS community and currently serves as President of ICAPS.

Sheila McIlraith: Reward Machines: Structuring reward function specifications and reducing sample complexity in reinforcement learning

Abstract: A standard assumption in reinforcement learning (RL) is that the agent does not have access to a faithful model of the world. As such, to learn optimal behaviour, an RL agent must interact with the environment and learn from its experience. While it seems reasonable to assume that the transition probabilities relating to the agent's actions are unknown, there is less reason to hide the reward function from the agent. Artificial agents cannot inherently perceive reward from the environment; someone must program those reward functions (even if the agent is interacting with the real world). Two challenges that face RL are reward specification and sample complexity. Specification of a reward function -- a mapping from state to numeric value -- can be challenging, particularly when reward-worthy behaviour is complex and temporally extended. Further, when reward is sparse, it can require millions of exploratory episodes for an RL agent to converge to a reasonable quality policy. In this talk, I present the notion of a Reward Machine, an automata-based structure that provides a normal form representation for reward functions. Reward Machines can be used natively to specify complex, non-Markovian reward-worthy behavior. Alternatively, because of their automata-based structure, a variety of compelling human-friendly formal languages can be used as reward specification languages and straightforwardly translated into Reward Machines, including variants of Linear Temporal Logic (LTL), and a variety of regular languages. Furthermore, Reward Machines expose reward function structure in a normal form. The Q-Learning for Reward Machines (QRM) algorithm exploits Reward Machine structure in its learning, while preserving optimality guarantees. Experiments show that QRM significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise can't reasonably be solved and critically reducing the sample complexity. This is joint work with Toronto students and colleagues Rodrigo Toro Icarte, Toryn Klassen, Rick Valenzano (now Element AI), and Alberto Camacho.

Bio: Sheila McIlraith is a Professor in the Department of Computer Science, University of Toronto. Prior to joining U of T, McIlraith spent six years as a Research Scientist at Stanford University, and one year at Xerox PARC. McIlraith is the author of over 100 scholarly publications on a variety of topics in artificial intelligence largely related in some way to sequential decision making, knowledge representation, reasoning, and search. McIlraith is a fellow of the Association for the Advancement of Artificial Intelligence (AAAI), associate editor of the Journal of Artificial Intelligence Research (JAIR), and is a past associate editor of the journal Artificial Intelligence (AIJ). She is currently serving on the standing committee of the Stanford One Hundred Year Study of Artificial Intelligence. McIlraith served as program co-chair of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR2012), and the International Semantic Web Conference (ISWC2004). In 2011 she and her co-authors were honoured with the SWSA 10-year Award, recognizing the highest impact paper from the International Semantic Web Conference, 10 years prior.