Automated metadata annotation: What is and is not possible with machine learning

Ming Fang Wu; Hans Brandhorst; Maria Cristina Marinescu; Joaquim More Lopez; Margorie Hlava; Joseph Busch

doi:10.1162/dint_a_00162

Automated metadata annotation: What is and is not possible with machine learning

Ming Fang Wu, Hans Brandhorst, Maria Cristina Marinescu, Joaquim More Lopez, Margorie Hlava, Joseph Busch

Producció científica: Article en revista indexada › Article › Avaluat per experts

6 Cites (Scopus)

Resum

Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It’s important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.

Idioma original	Anglès
Pàgines (de-a)	122-138
Nombre de pàgines	17
Revista	Data Intelligence
Volum	5
Número	1
DOIs	https://doi.org/10.1162/dint_a_00162
Estat de la publicació	Publicada - 1 de des. 2023
Publicat externament	Sí

Accés al document

10.1162/dint_a_00162

Altres arxius i enllaços

Enllaç a la publicació de Scopus

Com citar-ho

@article{67c2e5b3b17d481196d1ef370de5223e,

title = "Automated metadata annotation: What is and is not possible with machine learning",

abstract = "Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It{\textquoteright}s important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what{\textquoteright}s popular and what{\textquoteright}s available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.",

keywords = "Culture heritage, Metadata annotation, Metadata, Machine learning, Research data",

author = "Wu, {Ming Fang} and Hans Brandhorst and Marinescu, {Maria Cristina} and Lopez, {Joaquim More} and Margorie Hlava and Joseph Busch",

note = "Publisher Copyright: {\textcopyright} 2022 Chinese Academy of Sciences. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.",

year = "2023",

month = dec,

day = "1",

doi = "10.1162/dint_a_00162",

language = "English",

volume = "5",

pages = "122--138",

journal = "Data Intelligence",

issn = "2096-7004",

publisher = "China National Publications Import and Export (Group) Corporation",

number = "1",

}

TY - JOUR

T1 - Automated metadata annotation

T2 - What is and is not possible with machine learning

AU - Wu, Ming Fang

AU - Brandhorst, Hans

AU - Marinescu, Maria Cristina

AU - Lopez, Joaquim More

AU - Hlava, Margorie

AU - Busch, Joseph

PY - 2023/12/1

Y1 - 2023/12/1

N2 - Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It’s important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.

AB - Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It’s important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what’s popular and what’s available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.

KW - Culture heritage

KW - Metadata annotation

KW - Metadata, Machine learning

KW - Research data

UR - http://www.scopus.com/inward/record.url?scp=85150068543&partnerID=8YFLogxK

U2 - 10.1162/dint_a_00162

DO - 10.1162/dint_a_00162

M3 - Article

AN - SCOPUS:85150068543

SN - 2096-7004

VL - 5

SP - 122

EP - 138

JO - Data Intelligence

JF - Data Intelligence

IS - 1

ER -

Automated metadata annotation: What is and is not possible with machine learning

Resum

Accés al document

Altres arxius i enllaços

Fingerprint

Com citar-ho