February 11th, 2008, posted by Djoerd Hiemstra
by Pavel Serdyukov, Henning Rode, and Djoerd Hiemstra
This paper describes several approaches which we used for
the expert search task of the TREC 2007 Enterprise track.
We studied several methods of relevance propagation from
documents to related candidate experts. Instead of one-
step propagation from documents to directly related candidates,
used by many systems in the previous years, we do not
limit the relevance
flow and disseminate it further through
mutual documents-candidates connections. We model relevance
propagation using random walk principles, or in formal terms,
discrete Markov processes. We experiment with
infinite and finite numbers of propagation steps. We also
demonstrate how additional information, namely hyperlinks
among documents, organizational structure of the enterprise
and relevance feedback may be utilized by the presented
techniques.
[download pdf]
Posted in Paper abstracts | No Comments »
February 2nd, 2008, posted by Djoerd Hiemstra

who’s who?
Posted in Uncategorized | No Comments »
January 28th, 2008, posted by Djoerd Hiemstra
The DB Colloquium of Tuesday 29 January, 14:00 h.-15:00 h. in ZI-3126
consists of two small presentations.
Comprehending historical election programs using XML and XRPC
by Douwe van der Meij
Party programs for elections can be incomprehensible, not
to mention the comparison of current party programs to that
of a decade ago. This paper focusses on a way to compre-
hend the latter. It shows how to use xml to store election
programs and to query those. This paper also comes with a
proof of concept (PoC). In retrospect we look at this PoC,
and we discuss the design choices made.
Boeken zonder leeftijdscategorie sneller vinden
by Wout Maaskant
In dit onderzoek is een systeem ontwikkeld waarmee gebruikers
sneller boeken waar geen leeftijdscategorie aan is toegekend
kunnen vinden in een bol.com-corpus, door gebruik te maken van
eigenschappen van vergelijkbare boeken waar wel een
leeftijdscategorie aan is toegekend. De gelijkenis tussen boeken
wordt bepaald met behulp van het vector space model.
Posted in Information Retrieval, Colloquia | No Comments »
January 25th, 2008, posted by Djoerd Hiemstra
The Dutch Research School for Information and Knowledge Systems (SIKS) organizes its research in 7 so-called research foci. The scope of the SIKS research focus Data Management, Storage and Retrieval is the theory and the application of computers to the management of information, including the aspects of data acquisition, organization, storage, querying and retrieval, security and privacy, ranging from highly structured databases to unstructured natural language texts.
The research focus Data Management, Storage and Retrieval is shaped by two major success stories in Computer Science: 1) the development of relational database systems in the 1970’s and 1980’s mainly influenced by office automation and enterprise information systems, and 2) the development of large scale information retrieval systems at the end of the 1990’s, influenced by the development of the world wide web. The storage and retrieval component of today’s information system is formed by database management systems (DBMSs), which abstract the peculiarities of storage media and processing components into a data model, integrity rules, and query facilities. Although strong relational DBMSs have become a commodity product for administrative information system applications, they have been proven inadequate for storing and searching semi-structured data such as web data. The storage and retrieval component of today’s web search engines is formed by information retrieval (IR) systems, that provide effective ranking strategies, efficient indexes, and data compression. They focus on user satisfaction rather than on integrity of the data. Research themes that SIKS PhD students address are:
- Integration of Text, Data, and Streams: Create ways to integrate data retrieval and information retrieval, for instance for XML databases and XML streams
- Multimedia Retrieval: Create easy ways to analyze, summarize, search, and view multimedia information such as video databases.
- Sensor Data and Sensor Networks: Create ways to manage and query networks of very large numbers of low-cost devices.
- Reasoning about uncertain data: Create ways to analyze, query and reason over imprecise and uncertain data (Related to SIKS focus Knowledge representation & reasoning)
- Contextual Retrieval and User Interaction: Use knowledge about the user’s context to provide more effective results (related to SIKS focus Human computer interaction).
- Learning Ranking Algorithms: Create new models of information retrieval and machine learning of complex ranking functions (related to SIKS focus Computational intelligence)
- Enterprise search and Data Spaces: Integrate enterprise information to support business processes, for example expertise finding (related to SIKS focus Enterprise information systems)
- Distributed and peer-to-peer data management: Create ways to distribute data over many loosely coupled autonomous systems (related to SIKS focus Agent technology and Web-based information systems)
Posted in SIKS | No Comments »
January 22nd, 2008, posted by Djoerd Hiemstra
Welcome to the course XML & Databases 1. Following this course means you will read about some of the latest research on XML databases. We have updated the course considerably this year. We will discuss XML querying with XPath and XQuery, publishing of relational data, a lot about relational storage structures for XML data and query processing techniques, and a number of new courses on full-text querying, distributed querying and querying stand-off data. The study material contains 6 new paper compared to last year’s reader. To obtain the study material necessary for this course, you need to buy the 2008 version of the reader “XML & Databases 1 & 2″.
More info on TeleTOP
Posted in XML & Databases 1 | No Comments »
January 10th, 2008, posted by Djoerd Hiemstra
(Case study: Expert search)
I will give a tutorial at the 30th European Conference on Information Retrieval (ECIR): The tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions.
See the ECIR tutorials and workshops page
Posted in Uncategorized | No Comments »
December 19th, 2007, posted by Djoerd Hiemstra
by Djoerd Hiemstra, Henning Rode, and Jan Flokstra
PF/Tijah (Pathfinder/Tijah, pronounce as “Pee Ef Teeja“) is a flexible open source text search system developed at the University of Twente in cooperation with CWI Amsterdam and TU München. The system is
integrated in the Pathfinder XQuery database system and can be downloaded as part of MonetDB/XQuery. This report contains user documentation of PF/Tijah, including example usage in three show cases.
For more information, see: PF/Tijah site
Posted in Paper abstracts | No Comments »
December 12th, 2007, posted by Djoerd Hiemstra
There is more to life than computer science: See below, thanks to Wim Brilman a photo of our noble UT futsal team at the yearly christmas tournament.
We won one game, one draw and lost three: We ended fifth in a group of
six teams. Could be worse
Posted in Uncategorized | No Comments »
December 10th, 2007, posted by Djoerd Hiemstra
The Dutch-Belgian Information Retrieval workshop (DIR) will take place in Maastricht on April 14-15, 2008. The primary aim of the DIR workshop is to provide an international meeting place where researchers from the domain of information retrieval and related disciplines, can exchange information and present innovative research developments.
Hinrich Schuetze of the University of Stuttgart will give an invited talk at DIR2008 about his new book “Introduction to Information Retrieval” which will appear in 2008.
Deadline call for papers: 2 February 2008
http://www.ltci.ugent.be/DIR2008
Posted in Conferences & Workshops | No Comments »
December 4th, 2007, posted by Djoerd Hiemstra
by Henning Rode, Pavel Serdyukov, Djoerd Hiemstra, and Hugo Zaragoza
Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity’s indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models.
[download pdf]
Posted in Paper abstracts | No Comments »