LSDS-IR: Large-Scale and Distributed Systems for Information Retrieval 2014 edition of the workshop is co-located at ACM WSDM 2014

23Jan/140

LSDS-IR 2014 Final Program

The 11th International Workshop on

Large-Scale and Distributed Systems

for Information Retrieval

http://www.lsdsir.org

Co-located with ACM WSDM 2014,
February 28, 2014, New York City, NY, USA

Final Program:

09:00-09:10 Welcome and Opening
09:10-10:10 Invited Talk: Query Optimization in Search Engines: State of the Art and Open Problems, Torsten Suel (NYU)
10:10-10:30 Distribution by Document Size
Andrew Kane and Frank Tompa
10:30-11:00 Coffee break & informal discussions
11:00-12:00 Invited Talk: Graph Computing and Social Cognitive Analytics for Connected Big Data, Ching-Yung Lin (IBM TJ Watson)
12:00-12:20 An Exploration of Postings List Contiguity in Main-Memory Incremental Indexing
Jimmy Lin and Nima Asadi
12:20-12:40 Cost-aware Intersection Caching and Processing Strategies for In-memory Inverted Indexes
Esteban Feuerstein and Gabriel Tolosa
12:40-14:00 Closing & Lunch break

Filed under: News No Comments
23Jan/140

Invited Talk by Torsten Suel

We are glad to announce that prof. Torsten Suel from the Polytechnic Institute of NYU will give an invited talk at LSDS-IR 2014.

Query Optimization in Search Engines: State of the Art and Open Problems

Abstract:

Large search engines such as Google, Bing, Baidu, and Yandex expend tremendous hardware and energy resources on processing user queries, with data centers being added continuously to keep up with the increasing query loads, data sizes, and user expectations of result quality. This has motivated a lot of industrial and academic research on how to optimize query processing performance and thus reduce these costs. We give a brief introduction to this challenge, discuss the state of the art from an academic perspective, and then suggest some directions and issues for future work. Topics include query optimization for complex ranking functions, static index pruning, and the role of algorithmic and machine learning techniques in query processing.

Bio:

Torsten Suel is a Professor in the Department of Computer Science and Engineering at the NYU School of Engineering (NYU Poly), where he directs a research group working on search engines and web mining technology. He holds a Diplom degree from the Technical University of Braunschweig
(Germany), and a Ph.D. from the University of Texas at Austin. He joined NYU Poly (then called Polytechnic University) in 1998 after postdoctoral positions at the NEC Research Institute and Bell Labs. During 2008, he was a Principal Research Scientist at Yahoo! Research in Santa Clara, CA, while
on leave from NYU Poly.

Filed under: News No Comments
23Jan/140

Invited Talk by Ching-Yung Lin

We are glad to announce that dr. Ching-Yung Lin from the IBM Watson Research Center will give an invited talk at LSDS-IR 2014.

Graph Computing and Social Cognitive Analytics for Connected Big Data

Abstract:

Information Technology is moving into the Cognitive Computing era with the advances of smarter machines to manage the challenges of the rapidly expanding world of Big Data and Analytics. The size and complexity of the data is fueling a movement towards Graph Computing, as the traditional data management tools and techniques are not equipped to handle these non-uniform, semi-structured and highly interconnected data. Many real-world data are linked. Entities are dependent. Processing, storing, analyzing, retrieving, and visualizing connected data has been a major challenge for Big Data. Novel graph computing technologies are driving fundamental paradigm shifts.
I am going to discuss the challenge of Graph Computing, including Graph Database, High Performance Computing, Middleware for Hardware Optimization, Analytics Library, and Visualization. Graphs may be large or small, static or dynamic, topological or semantic, and property-oriented or Bayesian. I will also introduce research challenges to apply Graph Computing technologies for (1) Cognitive Analytics, which utilizes graphical models to understand and predict people's behavior for Security or Commerce, (2) Social Analytics, which analyzes collective behaviors of people in social media, and (3) Brain Analytics, which models and visualizes neuron's dynamic networks of animal brain such as mouse.

Bio:

Ching Yung Lin is the Manager of the Network Science Department in IBM T. J. Watson Research Center. He is also an Adjunct Professor in Columbia University since 2005 and in NYU since 2014. His research interest is mainly on fundamental research of multimodality signal understanding, network computing, and computational social & cognitive sciences, and applied research on security, commerce, and collaboration. Since 2011, Lin has been leading a team of more than 40 Ph.D. researchers in worldwide IBM Research Labs and more than 20 professors and researchers in 9 universities. He is the Principle Investigator of projects on Graph Computing and Social Cognitive Analytics.  Ching-Yung is an author of 160+ publications and 19 issued patents. His team recently won the Best Paper Award in BigData 2013, Best Paper Award in CIKM 2012, and Best Theme Paper Award in ICIS 2011. He is a Fellow of IEEE.

Filed under: News No Comments
6Dec/130

Deadline Extension

The submission deadline is extended to December 23, 2012.

Authors that have already submitted their paper, can upload a new version as needed.

You have two more weeks and half to submit you paper!

Filed under: News No Comments
30Sep/130

LSDS-IR 2014 at WSDM

We are pleased to announce that the next edition of the Large-scale and distributed systems for information retrieval workshop will be co-located with ACM WSDM 2014, in New York City.

Workshop date: Feb. 28, 2014

Call for papers is already available at the following link.

Organisers:
Nicola Tonellotto
Ismail Sengor Altingovde
Craig Macdonald
B. Barla Cambazoglu

Filed under: News No Comments
31Jan/130

LSDS-IR 2013 Final Program

The 10th International Workshop on

Large-Scale and Distributed Systems

for Information Retrieval

http://www.lsdsir.org

Co-located with ACM WSDM 2013,
February 5, 2013, Rome, Italy

Preliminary Program:

09:00-09:10 Welcome and Opening
09:10-10:10 Invited Talk: Analyzing the performance of top-k retrieval algorithms
Marcus Fontoura
10:10-10:40 Retrieval of Highly Dynamic Information in an Unstructured Peer-to-Peer Network
H. Asthana and Ingemar Cox.
10:40-11:00 Coffee break
Digital Libraries & Archives
11:00-11:30 Scalability Bottlenecks of the CiteSeerX Digial Library Search Engine
Jian Wu, Pradeep Teregowda, Eric Treece, Madian Khabsa, Douglas Jordan, Stephen Carman, Prasenjit Mitra and C. Lee Giles.
11:30-12:00 A Supervised Learning Method for Context-Aware Citation Recommendation in a Large Corpus
Lior Rokach.
12:00-12:30 User-Defined Redundancy in Web Archives
Bibek Paudel, Avishek Anand and Klaus Berberich.
12:30-13:00 Metric Suffix Array For Large-Scale Similarity Search
Hisham Mohamed and Stéphane Marchand-Maillet.
13:00-14:30 Lunch break
14:30-15:30 Invited Talk: Quasi-succinct indices
Sebastiano Vigna
15:30-16:00 Efficient Weighted Histogramming on GPUs with HASH
Maohua Zhu, Ningyi Xu, Di Wu, Chunshui Zhao, Yangdong Deng, Yu Wang and Feng-Hsiung Hsu.
16:00-16:30 Coffee break
Large Scale Techniques
16:30-17:00 Analysis of performance evaluation techniques for Large Scale Information Retrieval
Ana Freire, Fidel Cacheda, Vreixo Formoso and Víctor Carneiro.
17:00-17:30 Evaluating inverted files for visual compact codes on a large scale
Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi and Claudio Gennaro.
17:30-18:00 Open Discussion

The proceedings of the Workshop are available here: LSDS-IR 2013 Proceedings.

Filed under: News No Comments
30Nov/120

Deadline Extension

The submission deadline is extended to December 7, 2012.

Authors that have already submitted their paper, can upload a new version as needed.

You have one more week to submit you paper!

Filed under: News No Comments
23Oct/120

Invited Talk by Marcus Fontoura

We are glad to announce that dr. Marcus Fontoura from Google will give an invited talk at LSDS-IR 2013.

Analyzing the performance of top-k retrieval algorithms

Abstract:
Top-k retrieval is at the core of many modern applications: from large scale web search and advertising platforms, to text extenders and content management systems. In these systems, queries are evaluated using two major families of algorithms: document-at-a-time (DAAT) and
term-at-a-time (TAAT). DAAT and TAAT algorithms have been studied extensively in the research literature. In this talk, I'll present an analysis and comparison of several DAAT and TAAT algorithms, focusing on the performance characteristics of these algorithms.

Bio:
Marcus Fontoura has finished his Ph.D. studies in 1999, at the Pontifical Catholic University of Rio de Janeiro, Brazil (PUC-Rio) in a joint program with the Computer Systems Group, University of Waterloo, Canada. Since then he held research posts at the Princeton University Computer Science Department, IBM Almaden Research Center, and Yahoo! Research. Currently he is a Research Scientist and Member of Technical Staff at Google. His main areas of research in the last years have been Web Search, Computational Advertising, Enterprise search, and Databases. He has more than 40 published papers and 20 issued patents. His complete CV is available at: http://fontoura.org.

Filed under: News No Comments
12Oct/120

Invited Talk by Sebastiano Vigna

We are glad to announce that prof. Sebastiano Vigna from the Università degli Studi di Milano will give an invited talk at LSDS-IR 2013.

Quasi-succinct indices

Abstract:
Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller gaps using less bits. Additional data such as counts and positions is stored using similar techniques. A large body of research has been built in the last 30 years around gap compression, including theoretical modeling of the gap distribution, specialized instantaneous codes suitable for gap encoding, and ad hoc document reorderings which increase the efficiency of instantaneous codes. This talk will illustrate the proposal to represent an index using a different architecture based on quasi-succinct representation of monotone sequences. We will show that, besides being theoretically elegant and simple, the new index provides expected constant-time operations, space savings, and, in practice, significant performance improvements on conjunctive, phrasal and proximity queries.

Bio:
Sebastiano Vigna obtained his PhD in Computer Science from the Università degli Studi di Milano, where he is currently an Associate Professor. His interests lie in the interaction between theory and
practice. He has worked on highly theoretical topics such as computability on the reals, distributed computability, self-stabilization, minimal perfect hashing, succinct data structures, query recommendation, algorithms for large graphs and theoretical/experimental analysis of spectral rankings such as PageRank, but he is also (co)author of several widely used software tools ranging
from high-performance Java libraries to a model-driven software generator, a search engine, a crawler, a text editor and a graph compression framework. In 2011 he collaborated to the computation of the distance distribution of the whole Facebook graph, from which it was possible to evince that there on Facebook there are just 3.74 degrees of separation.

Filed under: News No Comments
31Oct/110

LSDS-IR ‘11 best paper award

This year's award is given to the paper entitled "Query efficiency prediction for dynamic pruning" by Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. We congratulate the authors for their great work.

The decision is given by taking into account the large amount of discussion this paper generated during the workshop and, more importantly, the positive feedback it received from the reviewers.

Filed under: News No Comments