Paper preview

Home

My name is Tom, I am a computer scientist who works on two major areas of research: High-dimensional Long-horizon Planning and Intrinsic Motivation, where my theory builds upon the Options Framework of Reinforcement Learning. "High-dimensional long-horizon planning" refers to tasks where the agent must achieve sequences of goals while tracking many dynamic state-variables (such as hunger or thirst states), and "intrinsic motivation" refers to creating goals and incentives that are not explicitly given to an agent.

I develop new forms of Markov Decision Processes and Bellman Equations to tackle these problems. My research shows how both domains benefit from abandoning reward-maximization in the underlying objective function. I believe the next wave of advanced AI systems will arise from fundamental advancements in discrete search through high-dimensional latent spaces of neural networks, and my research contributes to this direction.

Thesis

My PhD dissertation presents a comprehensive framework, showing how abandoning reward-maximization in the underlying objective function is advantageous for both high-dimensional planning and intrinsic motivation domains. This theory makes it possible to create agents that solve challenging multi-goal tasks in high-dimensions, and propose and evaluate goals deep into the future.

Key Papers

In this paper, we directly optimize predictive maps of goal-satisfaction and constraint-violation events that will occur under an agent's policy instead of value functions. By directly optimizing predictive maps, we can factorize them by the chain rule and can compose individual solutions into a composite solution for a long-horizon task with many dynamic variables, such as hunger or hydration states.

Here, we decompose long-horizon Boolean logic tasks with the Options Framework in a way that allows us to solve for an optimal meta-policy as a linear equation. In fact, the meta-policy solution is a principal eigenvector of a matrix. This builds upon Emo Todorov's theory of Linearly Solvable MDPs, where we extend it to hierarchical state-spaces and goal-conditioned options.

Current Work

I also develop algorithms for Safe AI applications and am currently participating in the MATS program. There has been a recent paradigm - known as guarenteed Safe AI - which is a call to create algorithms which can probabilistically verify that an agent's policy satisfies task specifications and safety constraints in large world models. I see my work as contributing to this effort. Currently I'm building on my previous theory to create agents whose underlying decision theory can be modified at test-time. This would allow agents to use various decision theories relevant to Safe AI, rather than being limited to expected value theory.

About

I studied genetics in undergrad, but eventually I decided I wanted to understand more abstract principles of biological intelligence, so I became a computer scientist and started working on Markov Decision Processes. I want to explain where the flexibility of animal intelligence comes from and what principles promote it. My goal is to create the algorithmic foundations for a theory of planning that achieves human-level sample efficiency in complex environments like Atari games.

Curriculum Vitae

Contact

You can contact me at: rings034@gmail.com