TY - JOUR
T1 - Automated metadata annotation
T2 - What is and is not possible with machine learning
AU - Wu, Ming Fang
AU - Brandhorst, Hans
AU - Marinescu, Maria Cristina
AU - Lopez, Joaquim More
AU - Hlava, Margorie
AU - Busch, Joseph
N1 - Publisher Copyright:
© 2022 Chinese Academy of Sciences. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It’s important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.
AB - Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It’s important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.
KW - Culture heritage
KW - Metadata annotation
KW - Metadata, Machine learning
KW - Research data
UR - http://www.scopus.com/inward/record.url?scp=85150068543&partnerID=8YFLogxK
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=pure_univeritat_ramon_llull&SrcAuth=WosAPI&KeyUT=WOS:000945967800007&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1162/dint_a_00162
DO - 10.1162/dint_a_00162
M3 - Article
AN - SCOPUS:85150068543
SN - 2096-7004
VL - 5
SP - 122
EP - 138
JO - Data Intelligence
JF - Data Intelligence
IS - 1
ER -