Matteo Bortoletto
If you think one or more papers are missing, please let me know:
matteo [dot] bortoletto [at] vis [dot] uni-stuttgart [dot] de
Table of contents
Formalising ToM
Proposes a metric to compute the complexity of a ToM task and a new prompting strategy called Discrete World Model (DWM)
- The complexity of a ToM task w.r.t. an object obj (i.e. an entity in the task) as the complexity of the stateful states plus the (discounted) sum of the others (i.e., stateless)
- Complexity of the stateful states = number of times the object in consideration changes state
- Complexity of stateless states = number of time another object changes state
- DWM prompting: as a story unfolds, the LLM is prompted multiple times to describe the state of the environment, which is concatenated to the next query. Complexity of DWM is higher than for CoT and on par with ToT → DWM shows relatively small and not consistent improvements across datasets.
- They show that ToMi and FANToM have been memorised by GPT-3.5-Instruct.
- The statefulness of each problem strongly correlates with the best-performing DWM split. Strong correlation between the error rate and the complexity of a task
Question Answering
Proposed BigToM. Because datasets lack control conditions, it is difficult to identify where models make mistakes. The proposed solution is to use causal templates + LLM to generate a dataset.
- Template: given an agent with a desire that performs an action resulting in a belief, change the state of the environment and manipulate the agent’s percepts to to create true or false beliefs
- Test LLMs for forward belief P(belief | percepts), backward belief P(belief | observed actions), and forward action P(action | percepts) inference
- Only first order beliefs