TY - JOUR AU - Xin Chen AU - Baojie Li AU - Jennifer L Braid AU - Brandon Byford AU - Dylan J Colvin AU - Andrew Glaws AU - Norman Jost AU - Benjamin Pierce AU - Salil Rabade AU - Martin Springer AU - Anubhav Jain AB -

Photovoltaic (PV) systems have become a cornerstone of renewable energy strategies, particularly due to the significant reduction in solar power costs over the past decade. However, the long-term reliability of PV installations presents a persistent challenge, requiring the development of advanced monitoring and predictive maintenance strategies. A wide range of data types is used to evaluate the health of PV systems, including environmental conditions, electrical performance, and inspection imagery. These data enable methodologies such as machine learning (ML) models for lifetime prediction and computer vision techniques for defect detection. However, the acquisition of high-quality and comprehensive data is difficult, particularly in terms of long-term consistency and data variety. Publicly available data sets serve as valuable resources for addressing these challenges, but they often suffer from fragmentation and are difficult to access. This paper presents a comprehensive review of existing open-source data sets related to PV degradation, analyzing their features, functionalities, and potential applications. We categorize these data sets based on the specific aspects of PV system information they cover, such as environmental conditions, operational monitoring, image inspection and module materials, and propose relevant tools and ML models for processing them. In addition, we propose practices for future data collection and usage, while also discussing potential directions in data-driven research. Our aim is to enhance data utilization and publication among researchers and industry professionals, promoting a deeper understanding of the role of data in enhancing the performance and durability of PV systems.

BT - Applied Energy DA - 10/2025 DO - 10.1016/j.apenergy.2025.126132 N2 -

Photovoltaic (PV) systems have become a cornerstone of renewable energy strategies, particularly due to the significant reduction in solar power costs over the past decade. However, the long-term reliability of PV installations presents a persistent challenge, requiring the development of advanced monitoring and predictive maintenance strategies. A wide range of data types is used to evaluate the health of PV systems, including environmental conditions, electrical performance, and inspection imagery. These data enable methodologies such as machine learning (ML) models for lifetime prediction and computer vision techniques for defect detection. However, the acquisition of high-quality and comprehensive data is difficult, particularly in terms of long-term consistency and data variety. Publicly available data sets serve as valuable resources for addressing these challenges, but they often suffer from fragmentation and are difficult to access. This paper presents a comprehensive review of existing open-source data sets related to PV degradation, analyzing their features, functionalities, and potential applications. We categorize these data sets based on the specific aspects of PV system information they cover, such as environmental conditions, operational monitoring, image inspection and module materials, and propose relevant tools and ML models for processing them. In addition, we propose practices for future data collection and usage, while also discussing potential directions in data-driven research. Our aim is to enhance data utilization and publication among researchers and industry professionals, promoting a deeper understanding of the role of data in enhancing the performance and durability of PV systems.

PB - Elsevier BV PY - 2025 EP - 126132 T2 - Applied Energy TI - Open data sets for assessing photovoltaic system reliability UR - https://doi.org/10.1016/j.apenergy.2025.126132 VL - 395 SN - 0306-2619 ER -