Decarbonization goals in the United States electricity sector are increasing the levels of renewable energy generation
in the electricity supply system, and are driving increased attention to building electrification, which will increase
the magnitude and shift the timing of the electricity system peak. These changes are motivating new approaches to
coordinate building electricity demand with low-carbon renewable generation, elevating the importance of demand
flexibility (DF) in buildings and the need to quantify the temporal impacts of DF. In this paper, we first characterize
the hourly predictive accuracy of six commonly used baseline models in an application context of quantifying
building-level load shift. Our analysis revealed insights such as hours of the day (afternoons), periods of the week
(weekends), and seasons (summer) that were predicted with more accuracy than other time periods. In addition, the
analysis showed tendencies toward overprediction or underprediction of load. Secondly, we provide the first
published investigation of baseline erosion from repeated dispatch of building load shifting. We observed that as the
baseline period is pushed back further from the prediction day, the distribution of errors across baseline model
predictions increases, with notable inflection points near the three-week erosion point for two of the three models.