Back

Digital humanities' project CLARIAH is digitizing datasets from various sources to get new science out of them

At the Third eScience Symposium, recently held at the ArenA in Amsterdam, The Netherlands, we had the opportunity to talk with José van Dijck, President of the Royal Netherlands Academy of Arts and Sciences (KNAW). José van Dijck is Professor of Media Studies at the University of Amsterdam. She is very interested in digital humanities as a new branch of the humanities that has become interested in digital sources. Her team is working with digital text but also with digital images and sounds. They work with structured data and they are interested in how the digital data becomes a new material from which they draw new interpretations. All this data is putting a lot of new challenges to the team as researchers.

Last year, they received a large grant of 12 million euro from the Dutch National Science Council to set up an infrastructure for all of the humanities. Now, they can develop instruments toward a better understanding and interpretation of digital files. For example, their historians are now developing the instruments to interpret large numbers of digitized files. They pull the data from textual archives.

José van Dijck is working with audio-visual data which have been digitized over the past five to ten years. Currently, her team is interested in how these digital files can be mined, especially by digital instruments.

The project is named CLARIAH. There are quite a lot of researchers involved. At least four Dutch universities are involved in the CLARIAH project.

Currently, José van Dijck is building a coalition for digital humanities in The Netherlands where at least 7 or 8 universities will be involved and 5 national institutes. All these researchers are interested in working with digitized data files that can help them towards becoming better at doing what they already did.

They are trying to collaborate very closely from the beginning between humanity researchers, such as historians, media scholars, linguists, and on the other hand, computer scientists to articulate our questions together so that they can do the research in collaboration and cooperation, rather than looking at computer scientists as helpers for humanities people.

What José van Dijck hopes is that the humanities people become more well versed at interpreting, at coding languages, the fundamentals of programming, in order to see that digital language is also a language of coding and algorithms. On the other hand, what they are trying to transpose is that computer scientists are also interested in hermeneutics, in data interpretation, rather than just doing algorithms. They really hope that they can collaborate and learn from each other.

They are dealing with very different kinds of data. For instance, they have structured data that come from archives, like large demographic data that come from municipal archives - from the 17th or 18th century. On the other hand, they have audio-visual sources. There is a large archive in The Netherlands - Sound and Vision, as they call it. The Institute of Sound and Vision is digitizing all of the television and radio programmes. Through subtitling, they can also get access to those texts, through the spoken words that have been used in certain programmes.

They also have linguistic data. If you transcribe text that you hear on television for instance, you also have textual data that can be mined by linguists. There is a lot of different kinds of data. The big challenge is to actually bring them together and to develop from these different datasets tools that can help them to combine those different datasets.

They are trying to ask the big questions that they cannot answer yet. For instance, how come that certain stereotypes of minorities persist in media images. This has been the case for 50 years now. You want to do research on how those images develop. Now we talk about migrants but one moment in time we talked about visiting employers. We have had a number of different terms for migrants coming to this country. It will be extremely interesting to combine data sets from a variety of sources, such as textual data, audio-visual data, and archives for instance, to have real evidence of how these migrations groups underwent various cultural transformations. That is what they are looking into.

They have a very strong group in The Netherlands, in all of the different disciplines that José van Dijck just mentioned but in Europe, as a whole, they have a very strong case for digital humanities. There are very strong groups in Ireland but also in Siberia, for instance. One of her colleagues just went to Siberia and there is a very active group of digital humanities' scholars in Siberia. Of course, in the UK, one has a very strong digital humanities' presence, and in Germany and France as well. Continuously, there are collaborating groups in Europe, and of course, also in the US. They also collaborate with their American colleagues.

More information is available at the CLARIAH project website.