TY - CPAPER
KW - Building control
KW - Reinforcement learning
KW - Off-policy Evaluation
AU - Bingqing Chen
AU - Ming Jin
AU - Zhe Wang
AU - Tianzhen Hong
AU - Mario Bergés
AB - <p>We present an initial study of off-policy evaluation (OPE), a prob-lem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a pol-icy’s performance without running it on the actual system, using historical data from the existing controller. It enables the control en-gineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approxi-mate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with im-itation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.</p>
BT - Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities
CY - Virtual Event JapanNew York, NY, USA
DA - 11/2020
DO - 10.1145/342777310.1145/3427773.3427871
LA - eng
N2 - <p>We present an initial study of off-policy evaluation (OPE), a prob-lem prerequisite to real-world reinforcement learning (RL), in the context of building control. OPE is the problem of estimating a pol-icy’s performance without running it on the actual system, using historical data from the existing controller. It enables the control en-gineers to ensure a new, pretrained policy satisfies the performance requirements and safety constraints of a real-world system, prior to interacting with it. While many methods have been developed for OPE, no study has evaluated which ones are suitable for building operational data, which are generated by deterministic policies and have limited coverage of the state-action space. After reviewing existing works and their assumptions, we adopted the approxi-mate model (AM) method. Furthermore, we used bootstrapping to quantify uncertainty and correct for bias. In a simulation study, we evaluated the proposed approach on 10 policies pretrained with im-itation learning. On average, the AM method estimated the energy and comfort costs with 1.84% and 14.1% error, respectively.</p>
PB - ACM
PP - Virtual Event JapanNew York, NY, USA
PY - 2020
SN - 9781450381932
T2 - Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities
T3 - BuildSys '20: The 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation
TI - Towards Off-policy Evaluation as a Prerequisite for Real-world Reinforcement Learning in Building Control
UR - https://dl.acm.org/doi/proceedings/10.1145/3427773https://dl.acm.org/doi/10.1145/3427773.3427871https://dl.acm.org/doi/pdf/10.1145/3427773.3427871
ER -