SIGMOD '15- Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Full Citation in the ACM Digital Library

SESSION: Keynote 1

From Data to Insights @ Bare Metal Speed

SESSION: Research Session 1 - Cloud: Parallel Execution

Distributed Outlier Detection using Compressive Sensing

Locality-aware Partitioning in Parallel Database Systems

ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout

Implicit Parallelism through Deep Language Embedding

From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System

SESSION: Research Session 2 - Matrix and Array Computations

sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms

Exploiting Matrix Dependency for Efficient Distributed Matrix Computation

LEMP: Fast Retrieval of Large Entries in a Matrix Product

Skew-Aware Join Optimization for Array Databases

Resource Elasticity for Large-Scale Machine Learning

SESSION: Research Session 3 - Security and Access Control

SEMROD: Secure and Efficient MapReduce Over HybriD Clouds

Authenticated Online Data Integration Services

ENKI: Access Control for Encrypted Query Processing

Collaborative Access Control in WebdamLog

Automatic Enforcement of Data Use Policies with DataLawyer

SESSION: Industry Session 1 - Streaming/Real-Time/Active

TencentRec: Real-time Stream Recommendation in Practice

Twitter Heron: Stream Processing at Scale

Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

Why Big Data Industrial Systems Need Rules and What We Can Do About It


Overview of Data Exploration Techniques


Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype?

SESSION: Research Session 4 - Cloud: Fault Tolerance, Reconfiguration

Cost-based Fault-tolerance for Parallel Data Processing

Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases

Madeus: Database Live Migration Middleware under Heavy Workloads for Cloud Environment

Lineage-driven Fault Injection

SESSION: Research Session 5 - Keyword Search and Text

Diversity-Aware Top-k Publish/Subscribe for Text Stream

Diverse and Proportional Size-l Object Summaries for Keyword Search

Local Filtering: Improving the Performance of Approximate Queries on String Collections

Exact Top-k Nearest Keyword Search in Large Networks

Efficient Algorithms for Answering the m-Closest Keywords Query

SESSION: Research Session 6 - Graph Primitives

Minimum Spanning Trees in Temporal Graphs

Efficient Enumeration of Maximal k-Plexes

Divide & Conquer: I/O Efficient Depth-First Search

Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity

SESSION: Research Session 7 - Data Mining

COMMIT: A Scalable Approach to Mining Communication Motifs from Dynamic Networks

LASH: Large-Scale Sequence Mining with Hierarchies

Twister Tries: Approximate Hierarchical Agglomerative Clustering for Average Distance in Linear Time

DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation

The TagAdvisor: Luring the Lurkers to Review Web Items

SESSION: Research Session 8 - Uncertainty and Linking

Supporting Data Uncertainty in Array Databases

Identifying the Extent of Completeness of Query Answers over Partially Complete Databases

k-Hit Query: Top-k Query with Probabilistic Utility Function

Linking Temporal Records for Profiling Entities

SESSION: Industry Session 2 - Applications

Telco Churn Prediction with Big Data

The LDBC Social Network Benchmark: Interactive Workload

Rethinking Data-Intensive Science Using Scalable Analytics Systems

QMapper for Smart Grid: Migrating SQL-based Application to Hive

SESSION: ACM-W Athena Lecturer Award

Three Favorite Results

SESSION: Keynote 2

The Power Behind the Throne: Information Integration in the Age of Data-Driven Discovery

SESSION: Research Session 9 - Transactional Architectures

On the Design and Scalability of Distributed Shared-Data Databases

Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems

FOEDUS: OLTP Engine for a Thousand Cores and NVRAM

Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems

SESSION: Research Session 10 - Privacy

Private Release of Graph Statistics using Ladder Functions

Bayesian Differential Privacy on Correlated Data

Modular Order-Preserving Encryption, Revisited

Chiaroscuro: Transparency and Privacy for Massive Personal Time-Series Clustering

SESSION: Research Session 11 - Streams

Persistent Data Sketching

Scalable Distributed Stream Join Processing

SCREEN: Stream Data Cleaning under Speed Constraints

Location-Aware Pub/Sub System: When Continuous Moving Queries Meet Dynamic Event Streams


CE-Storm: Confidential Elastic Processing of Data Streams

A SQL Debugger Built from Spare Parts: Turning a SQL: 1999 Database System into Its Own Debugger

Exploratory Keyword Search with Interactive Input

QE3D: Interactive Visualization and Exploration of Complex, Distributed Query Plans

DataXFormer: An Interactive Data Transformation Tool

Quality-Driven Continuous Query Execution over Out-of-Order Data Streams

MoDisSENSE: A Distributed Spatio-Temporal and Textual Processing Platform for Social Networking Services

DocRicher: An Automatic Annotation System for Text Documents Using Social Media

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications

G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data


Mining and Forecasting of Big Time-series Data

SESSION: Research Session 12 - Spatial data

Optimal Spatial Dominance: An Effective Search of Nearest Neighbor Candidates

THERMAL-JOIN: A Scalable Spatial Join for Dynamic Workloads

Indexing Metric Uncertain Data for Range Queries

Efficient Route Planning on Public Transportation Networks: A Labelling Approach

SESSION: Research Session 13- Crowdsourcing

The Importance of Being Expert: Efficient Max-Finding in Crowdsourcing

Minimizing Efforts in Validating Crowd Answers

iCrowd: An Adaptive Crowdsourcing Framework

QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications

tDP: An Optimal-Latency Budget Allocation Strategy for Crowdsourced MAXIMUM Operations


Thrifty: Offering Parallel Database as a Service using the Shared-Process Approach

BenchPress: Dynamic Workload Control in the OLTP-Bench Testbed

Demonstrating "Data Near Here": Scientific Data Search

Slider: An Efficient Incremental Reasoner

WANalytics: Geo-Distributed Analytics for a Data Intensive World

FTT: A System for Finding and Tracking Tourists in Public Transport Services

SharkDB: An In-Memory Storage System for Massive Trajectory Data

Ringo: Interactive Graph Analytics on Big-Memory Machines

STORM: Spatio-Temporal Online Reasoning and Management of Large Spatio-Temporal Data

PAXQuery: Parallel Analytical XML Processing

SESSION: Research Session 14 - Indexing & Performance

Cache-Efficient Aggregation: Hashing Is Sorting

Efficient Similarity Join and Search on Multi-Attribute Data

Holistic Indexing in Main-memory Column-stores

CliffGuard: A Principled Framework for Finding Robust Database Designs

Exploiting Correlations for Expensive Predicate Evaluation

SESSION: Research Session 15 - Data Cleaning

Query-Oriented Data Cleaning with Oracles

BigDansing: A System for Big Data Cleansing

Data X-Ray: A Diagnostic Tool for Data Errors

KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing

Crowd-Based Deduplication: An Adaptive Approach

SESSION: Research Session 16- Transactions

Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores

Optimizing Optimistic Concurrency Control for Tree-Structured, Log-Structured Databases

The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis

Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

SESSION: Industry Session 3 - Novel Systems

REEF: Retainable Evaluator Execution Framework

Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications

Design and Implementation of the LogicBlox System

Spark SQL: Relational Data Processing in Spark


Graft: A Debugging Tool For Apache Giraph

Even Metadata is Getting Big: Annotation Summarization using InsightNotes

StoryPivot: Comparing and Contrasting Story Evolution

The Flatter, the Better: Query Compilation Based on the Flattening Transformation

D2WORM: A Management Infrastructure for Distributed Data-centric Workflows

NL2CM: A Natural Language Interface to Crowd Mining

Optimistic Recovery for Iterative Dataflows in Action

A Secure Search Engine for the Personal Cloud

IReS: Intelligent, Multi-Engine Resource Scheduler for Big Data Analytics Workflows

Just can't get enough: Synthesizing Big Data

SESSION: Research Session 17 - Hardware-Aware Query Processing

Rack-Scale In-Memory Join Processing using RDMA

Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation

Rethinking SIMD Vectorization for In-Memory Databases

A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew

SESSION: Research Session 18 - Graph Propagation, Influence, Mining

GetReal: Towards Realistic Selection of Influence Maximization Strategies in Competitive Networks

Influence Maximization in Near-Linear Time: A Martingale Approach

Community Level Diffusion Extraction

BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs

The Minimum Wiener Connector Problem

SESSION: Research Session 19 - Social Networks

From Group Recommendations to Group Formation

Real-Time Multi-Criteria Social Graph Partitioning: A Game Theoretic Approach

Utility-Aware Social Event-Participant Planning

Online Video Recommendation in Sharing Community

SESSION: Industry Session 4 - Performance

Large-scale Predictive Analytics in Vertica: Fast Data Transfer, Distributed Model Creation, and In-database Prediction

Oracle Workload Intelligence

Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components

On Improving User Response Times in Tableau


Data Management in Non-Volatile Memory

SESSION: Research Session 20 - Information Extraction and Record Linking

TEGRA: Table Extraction by Global Record Alignment

Mining Quality Phrases from Massive Text Corpora

Mining Subjective Properties on the Web

Microblog Entity Linking with Social Temporal Context

SESSION: Research Session 21 - RDF and SPARQL

Graph-Aware, Workload-Adaptive SPARQL Query Caching

Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins)

How to Build Templates for RDF Question/Answering: An Uncertain Graph Similarity Join Approach

RBench: Application-Specific RDF Benchmarking

ALEX: Automatic Link Exploration in Linked Data

SESSION: Research Session 22 - Time Series & Graph Processing

k-Shape: Efficient and Accurate Clustering of Time Series

SMiLer: A Semi-Lazy Time Series Prediction System for Sensors

SQLGraph: An Efficient Relational-Based Property Graph Store

Updating Graph Indices with a One-Pass Algorithm

SESSION: Industry Session 5 - Usability

Amazon Redshift and the Case for Simpler Data Warehouses

ShareInsights: An Unified Approach to Full-stack Data Processing

SESSION: Research Session 23 - Advanced Query Processing

An Incremental Anytime Algorithm for Multi-Objective Query Optimization

Output-sensitive Evaluation of Prioritized Skyline Queries

Learning Generalized Linear Models Over Normalized Data

Utilizing IDs to Accelerate Incremental View Maintenance

SESSION: Research Session 24 - New Models

S4: Top-k Spreadsheet-Style Search for Query Discovery

Proactive Annotation Management in Relational Databases

Weighted Coverage based Reviewer Assignment

Distributed Online Tracking


Knowledge Curation and Knowledge Fusion: Challenges, Models and Applications

SESSION: Undergraduate Abstracts

Smooth Task Migration in Apache Storm

JAFAR: Near-Data Processing for Databases

Job Scheduling with Minimizing Data Communication Costs

One Loop Does Not Fit All

DunceCap: Compiling Worst-Case Optimal Query Plans

DunceCap: Query Plans Using Generalized Hypertree Decompositions