The Statistics and Machine Learning Thematic Seminar

The Thematic Seminar is a national seminar in the Netherlands focusing on recent developments in statistics. It has been running since 2013, and its current theme is Machine Learning; its previous theme was Statistics for Structures.

The talks in the Thematic Seminar concern methodological, theoretical, and applied findings. Talks discussing the existing literature and presenting new results in the topic are welcome.

The complete internet — Source of the picture: http://www.unc.edu/~unclng/Internet_History.htm

For more information please contact Tim van Erven (tim@timvanerven.nl) or Botond Szabó (botond.szabo@unibocconi.it).

Schedule of the seminar

*Academic year 2024-2025*
Date	Speaker	Title	Location
Oct. 21, 2024 16.00-17.00	Julien Zhou Inria Grenoble-Alpes and Criteo AI Lab	Submodular Optimization with Bandit feedbacks	Hybrid: UvA, Korteweg-de Vries Institute, Science Park 107, Room F1.15 and Online https://uva-live.zoom.us/j/83485037604

*Academic year 2023-2024*
Date	Speaker	Title	Location
Apr. 25, 2024 14.00-15.00	Sebastian Bordt University of Tübingen	Representation and Interpretation	UvA, Korteweg-de Vries Institute, Science Park 107 Room F3.20
Aug. 31, 2023 16.00-17.00	Chloé Rouyer University of Kopenhagen	A near-optimal best-of-both-worlds algorithm for online learning with feedback graphs	UvA, Science Park 904 Room A1.24

*Academic year 2022-2023*
Date	Speaker	Title	Location
Jul. 7, 2023 15.00-16.00	Nikita Zhivotovskiy UC Berkeley	Sharper Risk Bounds for Statistical Aggregation	UvA, Science Park 904 Room A1.04
Nov. 14, 2022 16.00-17.00	Umut Şimşekli INRIA and École Normale Supérieure	Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms	UvA, Science Park 904 Room A1.24
Nov. 10, 2022 16.00-17.00	Damien Garreau Université Côte d'Azur	What does LIME really see in images?	UvA, Science Park 904 Room B0.204

*Academic year 2021-2022*
Date	Speaker	Title	Location
Jun. 10, 2022 16.00-17.00	Julia Olkhovskaya Vrije Universiteit Amsterdam	Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits	Online https://uva-live.zoom.us/j/89796690874
Apr. 22, 2022 16.00-17.00	Tor Lattimore DeepMind	Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret	Online https://uva-live.zoom.us/j/88233925917
Apr. 8, 2022 11.00-12.00	Nicolò Cesa-Bianchi Università degli Studi di Milano	The power of cooperation in networks of learning agents	Online https://uva-live.zoom.us/j/83684658173
Mar. 11, 2022 16.00-17.00	Tomer Koren Tel Aviv University	Benign Underfitting of Stochastic Gradient Descent	Online https://uva-live.zoom.us/j/85690850169
Nov. 26, 2021 16.00-17.00	Samory Kpotufe Columbia University	Some Recent Insights on Transfer and Multitask Learning	Online https://uva-live.zoom.us/j/85155421740 Meeting ID: 851 5542 1740.

*Academic year 2020-2021*
Date	Speaker	Title	Location
June 25, 2021 16.00-17.00	Gergely Neu Universitat Pompeu Fabra	Information-Theoretic Generalization Bounds for Stochastic Gradient Descent	Online https://uva-live.zoom.us/j/81805477265 Meeting ID: 818 0547 7265

*Academic year 2019-2020*
Date	Speaker	Title	Location
May 8, 2020 15:00-16:00	Paulo Serra and Stephanie van der Pas TU Eindhoven and Leiden University	...	... ...
April 24, 2020 15:00-16:00	Hanne Kekkonen and Sven Wang (Cambridge) TU Delft and Cambridge	...	... ...
March 20, 2020 15:00-16:00	Eduard Belitser VU Amsterdam	Phase transition and uncertainty quantification in variable selection and multiple testing	Snellius building, MI, Leiden University, Niels Bohrweg 1 room 401
Feb. 11, 2020 15:00-16:00	Hedi Hadiji Université Paris-Sud	Polynomial cost of adaptation for X-armed bandits	Snellius building, MI, Leiden University, Niels Bohrweg 1 room 402

*Academic year 2018-2019*
Date	Speaker	Title	Location
May 3 15:00-16:00	Madalin Guta University of Nottingham	An introduction to quantum state estimation	MI Leiden, Niels Bohrweg 1 room 405
May 3 16:00-17:00	Emilie Kaufmann Inria Lille	Generalized Likelihood Ratio Tests applied to Sequential Decision Making	MI Leiden, Niels Bohrweg 1 room 405
April 12 15:00-16:00	Joris Mooij UvA	How To Learn Causal Relations From Data?	KdVI, UvA, 107 Science Park, Amsterdam room F3.20
March 1 15:00-16:00	Frans Oliehoek TU Delft	Bayesian Reinforcement Learning for Problems with State Uncertainty	MI Leiden, Niels Bohrweg 1 room 405
November 30 15:00-16:00	Zeynep Akata University of Amsterdam	Explaining and Representing Novel Concepts With Minimal Supervision	MI Leiden, Niels Bohrweg 1 room 403
November 2 15:00-16:00	Dirk van der Hoeven Leiden University	The Many Faces of Exponential Weights in Online Learning	MI Leiden, Niels Bohrweg 1 room 403

See further below for the schedules of earlier years.

Abstracts

Julien Zhou (Inria Grenoble-Alpes and Criteo AI Lab), https://jlnzhou.github.io/): Submodular Optimization with Bandit feedbacks

University of Amsterdam, October 21, 16.00-17.00, 2024 and
Online: https://uva-live.zoom.us/j/83485037604

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a nonmonotone submodular function, taking values in a known bounded interval. We propose an Explore-then-Commit inspired approach satisfying logarithmic problem-dependent, 1/2-approximate regret, leveraging the inner structure of the reward.

Sebastian Bordt (University of Tübingen), https://sbordt.github.io/): Representation and Interpretation

University of Amsterdam, April 25, 2024.

Researchers have proposed different conditions under which a function is deemed "interpretable". This includes classes of functions that are considered a priori interpretable (small decision trees, GAMs), and post-hoc introspection methods that allow to "interpret" arbitrary learned functions. In this talk, we discuss a perspective on interpretability that highlights the connections between the representation and interpretation of a function. As a concrete example, we derive the connections between the Shapley Values of a function and its representation as a Generalized Additive Model. We then ask whether similar connections hold for other classes of functions, and conclude with a discussion of different approaches in interpretable machine learning.

Chloé Rouyer (University of Kopenhagen, https://sites.google.com/view/chloerouyer/home): A near-optimal best-of-both-worlds algorithm for online learning with feedback graphs

University of Amsterdam, August 31, 2023.

In this work, we consider an online learning problem that interpolates between bandits and full information problems: extra observations are granted depending on the action played and the position of that action in a feedback graph, which is known to the learner. While this problem has been widely studied against both adversarial and stochastic environments separately, aiming to obtain best-of-both-worlds guarantees is a challenging problem that has only been studied recently. We propose an algorithm that achieves near-optimal guarantees in both the adversarial and the stochastic regimes simultaneously. Our results are also computationally efficient and can naturally adapt to sequences of feedback graphs that change over time.

Nikita Zhivotovskiy (UC Berkeley, https://sites.google.com/view/nikitazhivotovskiy/): Sharper Risk Bounds for Statistical Aggregation

University of Amsterdam, July 7, 2023.

In this talk, we take a fresh look at the classical results in the theory of statistical aggregation, focusing on the transition from global complexity to a more manageable local one. The goal of aggregation is to combine several base predictors to achieve a prediction nearly as accurate as the best one. This flexible approach operates without any assumptions on the structure of the class or the nature of the target. Though aggregation is studied in both sequential and statistical settings, each with their unique differences, they both traditionally use the same "global" complexity measure. Our discussion will highlight the lesser-known PAC-Bayes localization method used in our proofs, allowing us to prove a localized version of a classical bound for the exponential weights estimator by Leung and Barron, and a deviation-optimal localized bound for the Q-aggregation estimator. Furthermore, we will explore the links between our work and ridge regression. Joint work with Jaouad Mourtada and Tomas Vaškevičius.

Umut Şimşekli (INRIA and École Normale Supérieure, https://www.di.ens.fr/~simsekli/): Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

University of Amsterdam, November 14, 2022.

Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generalization performance. In this talk, I will approach this problem from a dynamical systems theory perspective and represent stochastic optimization algorithms as random iterated function systems (IFS). Well studied in the dynamical systems literature, under mild assumptions, such IFSs can be shown to be ergodic with an invariant measure that is often supported on sets with a fractal structure. We will prove that the generalization error of a stochastic optimization algorithm can be bounded based on the ‘complexity’ of the fractal structure that underlies its invariant measure. Leveraging results from dynamical systems theory, we will show that the generalization error can be explicitly linked to the choice of the algorithm (e.g., stochastic gradient descent – SGD), algorithm hyperparameters (e.g., step-size, batch-size), and the geometry of the problem (e.g., Hessian of the loss). We will further specialize our results to specific problems (e.g., linear/logistic regression, one hidden-layered neural networks) and algorithms (e.g., SGD and preconditioned variants), and obtain analytical estimates for our bound. For modern neural networks, we will develop an efficient algorithm to compute the developed bound and support our theory with various experiments on neural networks.

The talk is based on the following publication:
Camuto, A., Deligiannidis, G., Erdogdu, M. A., Gurbuzbalaban, M., Simsekli, U., & Zhu, L. (2021). Fractal structure and generalization properties of stochastic optimization algorithms. Advances in Neural Information Processing Systems, 34, 18774-18788.

Damien Garreau (Université Côte d'Azur, https://sites.google.com/view/damien-garreau/home): What does LIME really see in images?

University of Amsterdam, November 10, 2022.

The performance of modern algorithms on certain computer vision tasks such as object recognition is now close to that of humans. This success was achieved at the price of complicated architectures depending on millions of parameters and it has become quite challenging to understand how particular predictions are made. Interpretability methods propose to give us this understanding. In this talk, I will present a recent result about LIME, perhaps one of the most popular methods.

Julia Olkhovskaya (Vrije Universiteit Amsterdam, https://sites.google.com/view/julia-olkhovskaya/home): Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

Online, June 10, 2022, 16.00-17.00. https://uva-live.zoom.us/j/89796690874, Meeting ID: 897 9669 0874

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of Russo and Van Roy [2016] to the contextual setting by introducing a new concept of information ratio based on the mutual information between the unknown model parameter and the observed loss. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. We also extend our results to priors with infinite entropy under a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with d-dimensional parameters, K actions, and Lipschitz logits.

This is joint work with Gergely Neu, Matteo Papini and Ludovic Schwartz.

Tor Lattimore (DeepMind, http://tor-lattimore.com/): Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

Online, April 22, 2022, 16.00-17.00. https://uva-live.zoom.us/j/88233925917, Meeting ID: 882 3392 5917

The information ratio developed by Russo and Van Roy (2014) is a powerful tool that was recently used to derive upper bounds on the regret for challenging sequential decision-making problems. I will talk about how a generalised version of this machinery can be used to derive lower bounds and give an application showing that a version of mirror descent is minimax optimal for partial monitoring using Rustichini's definition of regret.

Nicolò Cesa-Bianchi (Università degli Studi di Milano, https://cesa-bianchi.di.unimi.it/): The power of cooperation in networks of learning agents

Online, April 8, 2022, 11.00-12.00. https://uva-live.zoom.us/j/83684658173, Meeting ID: 836 8465 8173

Abstract: We study the power of cooperation in a network of communicating agents that solve a learning task. Agents use an underlying communication network to get information about what the other agents know. In the talk, we show the extent to which cooperation allows to prove performance bounds that are strictly better than the known bounds for non-cooperating agents. Our results are formulated in the setting of sequential decision-making with partial feedback.

Tomer Koren (Tel Aviv University, https://tomerkoren.github.io/): Benign Underfitting of Stochastic Gradient Descent

Online, March 11, 2022, 16.00-17.00. https://uva-live.zoom.us/j/85690850169, Meeting ID: 856 9085 0169.

We study to what extent may stochastic gradient descent (SGD) be understood as a ``conventional'' learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where SGD is classically known to minimize the population risk at an optimal rate, and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap lower bounded by a universal constant. Consequently, it turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis). Time permitting, we will discuss related results for with-replacement SGD, multi-epoch SGD, and full-batch gradient descent.
Based on joint works with Idan Amir, Roi Livni, Yishay Mansour and Uri Sherman.

Samory Kpotufe (Columbia University, http://www.columbia.edu/~skk2175/): Some Recent Insights on Transfer and Multitask Learning.

Online, November 26, 2021, 16.00-17.00. https://uva-live.zoom.us/j/85155421740 Meeting ID: 851 5542 1740.

A common situation in Machine Learning is one where training data is not fully representative of a target population due to bias in the sampling mechanism or due to prohibitive target sampling costs. In such situations, we aim to ’transfer’ relevant information from the training data (a.k.a. source data) to the target application. How much information is in the source data about the target application? Would some amount of target data improve transfer? These are all practical questions that depend crucially on 'how far' the source domain is from the target. However, how to properly measure 'distance' between source and target domains remains largely unclear. In this talk we will argue that much of the traditional notions of 'distance' (e.g. KL-divergence, extensions of TV such as D_A discrepancy, density-ratios, Wasserstein distance) can yield an over-pessimistic picture of transferability. Instead, we show that some asymmetric notions of 'relatedness' between source and target (which we simply term 'transfer-exponents') capture a continuum from easy to hard transfer. Transfer-exponents uncover a rich set of situations where transfer is possible even at fast rates; they encode relative benefits of source and target samples, and have interesting implications for related problems such as 'multi-task or multi-source learning'. In particular, in the case of transfer from multiple sources, we will discuss (if time permits) a strange phenomena: no procedure can guarantee a rate better than that of having a single data source, even in seemingly mild situations where multiple sources are informative about the target. The talk is based on earlier work with Guillaume Martinet, and ongoing work with Steve Hanneke.

Gergely Neu (Universitat Pompeu Fabra, http://cs.bme.hu/~gergo/): Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

Online, June 25, 2021, 16.00-17.00. https://uva-live.zoom.us/j/81805477265 Meeting ID: 818 0547 7265.

We study the generalization properties of the popular stochastic gradient descent method for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates.

Hedi Hadiji (Université Paris-Sud): Polynomial cost of adaptation for X-armed bandits

Leiden, February 11, 2020

The X-armed bandit problem is a sequential learning problem in which a player aims to maximize her reward by choosing among a continuum of actions. The minimax optimal rates of the regret depend on the Hölder regularity of the objective function. When the regularity is known, these optimal rates are easily achieved by standard methods. However, finding an algorithm that would adapt to the unknown regularity has been a challenging problem in the recent years. A recent lower bound of Locatelli and Carpentier (2018) shows that it is in fact impossible to reach the same rates as if the smoothness was known to the player. We revisit this lower bound, and define a family of admissible rate functions. We provide an algorithm that matches this lower bound, and therefore adapts as well as possible to the unknown smoothness. This characterizes a polynomial cost of adaptation in the X-armed bandit problem.

Emilie Kaufmann (University of Lille): Generalized Likelihood Ratio Tests applied to Sequential Decision Making

Leiden, May 3, 2019

In this talk I will discuss the use of a well-known statistical test, based the Generalized Likelihood Ratio (GLR), for different sequential decision making problems cast in the so-called multi-armed bandit framework. I a first part, I will focus on active testing in a bandit model, for which I will propose a choice of threshold for which a GLR-based stopping rule can be proved to return a correct answer with high probability. You will also understand why GLR are natural candidates for making a decision using a minimal number of samples. In a second part, I will show a different application of GLR tests to sequential decision making: GLR-based change-point detectors can be useful for minimizing regret in a piece-wise stationary multi-armed bandit.

Madalin Guta (University of Nottingham): An introduction to quantum state estimation

Leiden, May 3, 2019

Quantum state estimation is a key statistical tool for quantum engineering applications. In quantum computation for instance, one applies a certain sequence of transformations (quantum algorithm) to a register of two dimensional quantum systems (qubits), which often produces joint states with special correlation properties (entanglement). In order to verify that the preparation procedure has been successful, one needs to measure the final state, to obtain a random outcome whose distribution depends on the quantum state. By repeating the procedure over many independent preparations, one collects statistical data that can be used to statistically reconstruct the state.
In this talk I will introduce the basic concepts required to formulation the state estimation problem (also known as quantum tomography), and present a recently proposed method for doing this. The Projected Least Squares (PLS) estimator consists of first computing a traditional least square estimator, which is subsequently projected onto the space of quantum states (positive trace-one matrices). This method is faster that traditional maximum likelihood estimation, and has good statistical behaviour. Indeed, using matrix concentration inequalities we show that PLS attains fundamental lower bounds for the estimation of low rank states with separate measurements.
Time permitting, I will discuss a more fundamental question concerning the highest estimation precision allowed by quantum mechanics. Here the concept of quantum local asymptotic normality (QLAN) gives a precise answer to this question and indicates what measurements are required to achieve this precision

Joris Mooij (UvA): How To Learn Causal Relations From Data?

Amsterdam, April 12, 2019

Many questions in science, policy making and everyday life are of a causal nature: how would a change of A affect B? Causal inference, a branch of statistics and machine learning, studies how cause-effect relationships can be discovered from data and how these can be used for making predictions in situations where a system has been perturbed by an external intervention. The ability to reliably make such causal predictions is of great value for practical applications in a variety of disciplines. The standard method to discover causal relations is by using experimentation. Over the last decades, alternative methods have been proposed: constraint-based causal discovery methods can sometimes infer causal relations from certain statistical patterns in purely observational data. In this talk, I will introduce the basics of both approaches to causal discovery. I will discuss how these different ideas can be elegantly combined in Joint Causal Inference (JCI), a novel constraint-based approach to causal discovery from multiple data sets. This approach leads to a significant increase in the accuracy and identifiability of the predicted causal relations. One of the remaining big challenges is how to scale up the current algorithms such that large-scale causal discovery becomes feasible.

Frans Oliehoek (TU Delft): Bayesian Reinforcement Learning for Problems with State Uncertainty

Leiden, March 1, 2019

Sequential decision making under uncertainty is a challenging problem, especially when the decision maker, or agent, has uncertainty about what the true 'state' of the environment is. The partially observable Markov decision process (POMDP) framework models such problem that are 'partially observable': there are important pieces of information that are fundamentally hidden from the agent. Planning in a POMDP is difficult, but the problem gets even more complex when no accurate model of the environment is available. In such cases, the agent will need to update its belief over the environment, i.e., learn, during execution.

In this talk, I will introduce the POMDP framework, as well as a more recent extension to the learning setting called Bayes-Adaptive POMDP (BA-POMDP). I will explain how the learning problem can be tackled using a Monte Carlo tree search method called 'POMCP', how this can be made more efficient via a number of novel techniques, and how we can further increase its effectiveness by exploiting structure in the environment. Time permitting, I will also discuss extensions of this methodology that explicitly deal with coordination with other agents, and anticipation of other actors (such as humans) in the environment.

Zeynep Akata (University of Amsterdam): Explaining and Representing Novel Concepts With Minimal Supervision

Leiden, November 30, 2018

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image properties which justify visual predictions. In this talk, I will present my past and current work on Zero-Shot Learning, Vision and Language for Generative Modeling and Explainable Artificial Intelligence where we show (1) how to generalize image classification models to cases when no visual training data is available, (2) how to generate images andimage features using detailed visual descriptions, and (3) how our models focus on discriminating properties of the visible object, jointly predict a class label, explain why/not the predicted label is chosen for the image.

Dirk van der Hoeven (Leiden University): The Many Faces of Exponential Weights in Online Learning

Leiden, November 2, 2018

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. We explore the alternative approach of putting Exponential Weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization. D. van der Hoeven, T. van Erven and W. Kotłowski. Proceedings of Machine Learning Research, vol. 35: Proceedings of the 31st Conference on Learning Theory (COLT), pp. 2067-2092, 201.

Csaba Szepesvari (Deepmind/University of Alberta): Completing the Classification of Adversarial Partial Monitoring Games

Leiden, June 1, 2018

Partial monitoring is a generalization of bandit problems where the relationship between the feedback and the loss is loosened. As such, partial monitoring offers a rich framework to study the exploration-exploitation dilemma. In the finite, adversarial version of the problem that we consider here a learner and an adversary take actions in a sequential fashion from their respective finite action sets. A pair of actions result in a fixed loss and a fixed observation; the mapping that maps pairs of actions to the losses and observations is given to the learner ahead of time. The unknown are the actions taken by the adversary. Rather, the learner gains information of the adversaries actions by receiving the observation underlying the joint action of a round. Since what observations can be received in a round is governed by the action taken, the learner has control over what it can receive information about. The learner's goal, as usual, is to keep its total regret, the difference between the total loss suffered and the loss that could have been suffered had the had full knowledge of the adversaries actions and played the best response to these, as small possible.

The classification problem of partial monitoring games is concerned with determining how to play in a given game described by the map from joint actions to losses and observations so as to achieve at most a constant multiple of the minimax regret. The study of partial monitoring games started with the work Rusitchini in 1999, and significant advances were made by Piccolboni and Schindelhauer, Cesa-Bianchi, Lugosi and and Stoltz. Around 2010, the goal of the full classification emerged and an almost complete answer was given by the joint work of Bartok, Foster, Pal, Rakhlin and Szepesvari (2014).

This work gave an almost complete characterization of all possible partial monitoring and identified four regimes: trivial, easy, hard and impossible games. The characterization, however, left out a set of games, namely those when there is at least one actions that are optimal under some play of the adversary, but the region where this happens has a dimension defect. As a result, even though these "tricky" actions are nondominated, they could be omitted if not for the information they bring in. This creates a tricky situation when designing algorithms and until now deciding whether the presence of these tricky actions can move a game from the easy to the hard category remained open. The answer was known to be "no" for stochastic games when the adversary plays a fixed stochastic strategy. In this talk, I present the solution to this problem: It turns out that a simplified and improved version of the algorithm by Foster and Rakhlin can achieve O(T^(1/2)) regret for these problems, thus resolving the open problem and completing the characterization of finite adversarial partial monitoring games. In the talk, I will cover the key ideas of the characterization in previous works, then motivate and explain the new ideas that led to the new algorithm and explain the proof of the new result.

The talk is based on joint work with Tor Lattimore.

Konstantin Eckle (Leiden University): A comparison of deep networks with ReLU activation function and linear spline-type methods

Amsterdam, May 18, 2018

Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable by DNNs in the sense that for any given function that can be expressed as a function in MARS with M parameters there exists a multilayer neural network with O(Mlog(M/ε)) parameters that approximates this function up to sup-norm error ε. We show a similar result for expansions with respect to the Faber-Schauder system. Based on this, we derive risk comparison inequalities that bound the statistical risk of fitting a neural network by the statistical risk of spline-based methods. This shows that deep networks perform better or only slightly worse than the considered spline methods. We provide a constructive proof for the function approximations.

Alessandra Mattei, Peng Ding, and Fabrizia Mealli: Assessing causal effects in the presence of treatment switching through principal stratification

Leiden, May 4, 2018

In clinical trials focusing on survival outcomes for patients suffering from Acquired Immune Deficiency Syndrome (AIDS)-related illnesses and particularly painful cancers in advanced stages, patients in the control arm are often allowed to switch to the treatment arm if their physical conditions are worse than certain tolerance levels. The Intention To-Treat analysis, comparing groups formed by randomization regardless of the treatment actually received, is often used; although it provides valid causal estimates of the effect of assignment, it does not give information about the effect of the actual receipt of the treatment and ignores the information of treatment switching in the control group. Other existing methods in the literature propose to reconstruct the outcome a unit would have had if s/he had not switched but they are usually based on strong assumptions, like that there exist no relation between patient’s prognosis and switching behavior or the effect is constant in same scale. Clearly, the switching status of the units in the control group contains important post-treatment information and it is useful to characterize the heterogeneity of the treatment effect. We propose to re-define the problem of treatment switching using principal stratification and introduce new causal estimands, principal causal effects for patients belonging to subpopulations defined by the switching behavior under the control treatment, which appropriately adjust for the post-treatment information and characterize treatment effect heterogeneity. For inference, we use a Bayesian approach, which allows us to properly take into account that (i) switching happens in continuous time generating a continuum of principal strata; (ii) switching time is not defined for units who never switch in a particular experiment; and (iii) both survival time, the outcome of primary interest, and switching time are subject to censoring. We illustrate our framework using simulated data based on the Concorde study, a randomized controlled trial aimed to assess causal effects on time-to-disease progression or death of immediate versus deferred treatment with zidovudine among patients with asymptomatic HIV infection.

Johannes Schmidt-Hieber (MI Leiden): Statistical theory for deep neural networks with ReLU activation function

Amsterdam, April 6, 2018

The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to logarithmic factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural.

Tim van Erven (MI Leiden): Welcome to the Zoo – Fast Rates in Statistical and Online Learning

Leiden, March 23, 2018

The great successes of learning theory characterize how fast we can learn the best parameters for the hardest possible learning task. However, both in statistical learning and in online learning, the hardest possible learning task is often pathological. For example, in classification it is most difficult to estimate parameters when the class labels are (close to) random, but in this case there is no point in doing classification anyway. Similarly, in online learning with expert advice, the hardest case happens when the experts are all making random guesses. But then there is nothing to gain from learning the best expert.

It is therefore interesting to look at ways of characterizing the difficulty of learning tasks, and indeed there exists a whole zoo of different conditions that allow learning at rates that are faster than is possible for the worst possible learning task. I will give a series of examples illustrating the most important conditions that allow such fast rates, and then explain how these conditions may all be understood as being essentially equivalent, at least for bounded losses.

Based on the following papers:

Fast Rates in Statistical and Online Learning - Van Erven, Grünwald, Mehta, Reid, Williamson. JMLR, 2015. (Special issue dedicated to the memory of Alexey Chervonenkis.)
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning - Koolen, Grünwald, Van Erven. NIPS, 2016.

Wouter Koolen (CWI): Bandit Algorithms for Best Arm Identification and Game Tree Search

Leiden, February 23, 2018

Sequential allocation problems with partial feedback (aka Bandit problems) have a rich tradition including applications to drug testing, online advertisement and more recently min-max game tree search. We will start by looking at the fundamental statistical identification problem. In particular, we will see how sample complexity lower bounds can be "inverted" to derive matching optimal algorithms. We will then look at extensions of the framework suitable for finding the optimal move in min-max game trees. Throughout, the talk will focus on the main statistical and computational challenges and ideas.

Mark van de Wiel (VUMC): Empirical Bayes for p>n: use of auxiliary information to improve prediction and variable selection

Leiden, December 1, 2017

Empirical Bayes (EB) is a versatile approach to 'learn from a lot' in two ways: first, from a large number of variables and second, from a potentially large amount of prior information, e.g. stored in public repositories. I will present applications of a variety of EB methods to several prediction methods, with examples on ridge regression and Bayesian models with a spike-and-slab prior. Both (marginal) likelihood and moment-based EB methods will be discussed. I consider a simple empirical Bayes estimator in a linear model setting to study the relation between the quality of an empirical Bayes estimator and p. I argue that EB is particularly useful when the prior contains multiple parameters, modeling a priori information on variables, termed 'co-data'. This will be illustrated with an application to cancer genomics. Finally, some ideas on how to include prior structural information in a ridge setting will be shortly discussed.

Ioan Gabriel Bucur (RU): Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness

Leiden, November 3, 2017

Causal effect estimation from observational data is an important and much studied research topic. The instrumental variable (IV) and local causal discovery (LCD) patterns are canonical examples of settings where a closed-form expression exists for the causal effect of one variable on another, given the presence of a third variable. Both rely on faithfulness to infer that the latter only influences the target effect via the cause variable. In reality, it is likely that this assumption only holds approximately and that there will be at least some form of weak interaction. This brings about the paradoxical situation that, in the large-sample limit, no predictions are made, as detecting the weak edge invalidates the setting. We introduce an alternative approach by replacing strict faithfulness with a prior that reflects the existence of many 'weak' (irrelevant) and 'strong' interactions. We obtain a posterior distribution over the target causal effect estimator which shows that, in many cases, we can still make good estimates. We demonstrate the approach in an application on a simple linear-Gaussian setting, using the MultiNest sampling algorithm, and compare it with established techniques to show our method is robust even when strict faithfulness is violated. This is joint work with Tom Claassen and Tom Heskes

Felix Lucka (CWI & UCL): Sparse Bayesian Inference and Uncertainty Quantification for Inverse Imaging Problems

Leiden, October 20, 2017

During the last two decades, sparsity has emerged as a key concept to solve linear and non-linear ill-posed inverse problems, in particular for severely ill-posed problems and applications with incomplete, sub-sampled data. At the same time, there is a growing demand to obtain quantitative instead of just qualitative inverse results together with a systematic assessment of their uncertainties (Uncertainty quantification, UQ). Bayesian inference seems like a suitable framework to combine sparsity and UQ but its application to large-scale inverse problems resulting from fine discretizations of PDE models leads to severe computational and conceptional challenges. In this talk, we will focus on two different Bayesian approaches to model sparsity as a-priori information: Via convex, but non-smooth prior energies such as total variation and Besov space priors and via non-convex but smooth priors arising from hierarchical Bayesian modeling. To illustrate our findings, we will rely on experimental data from challenging biomedical imaging applications such as EEG/MEG source localization and limited-angle CT. We want to share the experiences, results we obtained and the open questions we face from our perspective as researchers coming from a background in biomedical imaging rather than in statistics and hope to stimulate a fruitful discussion for both sides.

Gino Kpogbezan (MI Leiden): A sparse Bayesian SEM approach to network recovery using external knowledge

Amsterdam Science Park, October 6, 2017

We develop a sparse Bayesian Simultaneous Equations Models (SEMs) approach to network reconstruction which incorporates prior knowledge. We use an extended version of the horseshoe prior for the regressions parameters where on one hand priors have been assigned to hyperparameters and on the other hand the hyperparameters have been estimated by Empirical Bayes (EB). We use fast variational Bayes method for posterior densities approximation and compare its accuracy with that of MCMC strategy. Compared to their ridge counterpart, both models perform well in sparse situations, specially, the EB approach seems very promising. In a simulation study we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong.

Past Schedules

*Academic year 2017-2018*
Date	Speaker	Title	Location
June 1 15:00-16:00	Csaba Szepesvari Deepmind/University of Alberta	Completing the Classification of Adversarial Partial Monitoring Games	MI Leiden, Niels Bohrweg 1 room 401
May 18 15:00-16:00	Konstantin Eckle Leiden University	A comparison of deep networks with ReLU activation function and linear spline-type methods	UvA, Science Park 904 D1.115
May 4 15:00-16:00	Alessandra Mattei UniFL	Assessing causal effects in the presence of treatment switching through principal stratification	MI Leiden, Niels Bohrweg 1 room 401
April 6 15:00-16:00	Johannes Schmidt-Hieber Leiden University	Statistical theory for deep neural networks with ReLU activation function	UvA, Science Park 904 D1.115
March 23 15:00-16:00	Tim van Erven Leiden University	Welcome to the Zoo: Fast Rates in Statistical and Online Learning	MI Leiden, Niels Bohrweg 1 room 408
February 23 15:00-16:00	Wouter Koolen CWI	Bandit Algorithms for Best Arm Identification and Game Tree Search	UvA, Science Park 904 room D1.115
December 1 15:00-16:00	Mark van de Wiel VU medical center	Empirical Bayes for p>n: use of auxiliary information to improve prediction and variable selection	MI Leiden, Niels Bohrweg 1 room 408
November 3 15:00-16:00	I. Gabriel Bucur Radboud University	Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness	MI Leiden, Niels Bohrweg 1 room 408
October 20 15:00-16:00	Felix Lucka CWI & UCL	Sparse Bayesian Inference and Uncertainty Quantification for Inverse Imaging Problems	MI Leiden, Niels Bohrweg 1 room 408
October 6 15:00-16:00	Gino Kpogbezan Leiden University	A sparse Bayesian SEM approach to network recovery using external knowledge	UvA, Science Park 904, room D1.110
September 22 15:00-16:00	... University of Amsterdam	...	TBA room

*Academic year 2016-2017*
Date	Speaker	Title	Location
February 24 15:00-16:00	Jarno Hartog University of Amsterdam	Nonparametric Bayesian label prediction on a graph	UvA
March 17 15:00-16:00	Löic Schwaller Leiden University	Exact Bayesian inference for off-line change-point detection in tree-structured graphical models	Leiden
April 7 15:00-16:00	Peter Bloem Vrije Universiteit Amsterdam	Network motif detection at scale	Leiden
May 12 15:00-16:00	Nurzhan Nurushev Vrije Universiteit Amsterdam	Oracle uncertainty quantification for biclustering model	Leiden
October 7 15:00-16:00	Guus Regts University of Amsterdam	Approximation algorithms for graph polynomials and partition functions	UvA ScP 105-107 KdVI room F3.20
November 4 15:00-16:00	Marco Grzegorczyk Rijksuniversiteit Groningen	Bayesian inference of semi-mechanistic network models	UvA ScP 105-107 KdVI room F3.20
November 25 15:00-16:00	Joris Mooij University of Amsterdam	Automating Causal Discovery and Prediction	UvA ScP 105-107 KdVI room F3.20
December 9 15:00-16:00	Stephanie van der Pas Leiden University	Bayesian community detection	UvA ScP 105-107 KdVI room F3.20

*Academic year 2015-2016*
Date	Speaker	Title	Location
October 16 15:00-16:00	Ervin Tánczos Eindhoven University of Technology	Adaptive Sensing for Recovering Structured Sparse Sets	UvA ScP 904 room C0.110
October 23 15:00-16:00	Moritz Schauer University of Amsterdam	Working with graphs in Julia	UvA ScP 904 room A1.04
November 6 15:00-16:00	Fengnan Gao Leiden University	On the Estimation of the Preferential Attachment Network Model	UvA ScP 904 room A1.04
March 18 15:00-16:00	Paulo Serra University of Amsterdam	Dimension Estimation using Random Connection Models	UvA ScP 904 room F1.02
April 15 15:00-16:00	Rui Castro Eindhoven University of Technology	Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans	UvA ScP 904 room G2.10
April 22 15:00-16:00	Koen van Oosten Leiden University	Achieving Optimal Misclassification Proportion in Stochastic Block Model	UvA ScP 904 room G2.10
May 13 15:00-16:00	Wessel van Wieringen Vrije University Amsterdam	A tale of two networks: two GGMs and their differences	UvA ScP 904 room A1.04
June 3 15:00-16:00	Pariya Behrouzi Rijksuniversiteit Groningen	Detecting Epistatic Selection in the Genome of RILs via a latent Gaussian Copula Graphical Model	UvA ScP 904 room A1.10

*Academic year 2014-2015*
Date	Speaker	Title	Location
March 13 15:30	Chao Gao Yale University	Rate-optimal graphon estimation	Leiden University room 401
April 1 15:00-17:00	Moritz Schauer University of Amsterdam Botond Szabo University of Amsterdam	A graphical perspective on Gauss-Markov process priors Detecting community structures in networks	VU Amsterdam room WN-M607
April 17 14:00-16:00	Bartek Knapik Vrije University Johannes Schmiedt-Hieber Leiden University	Point process modelling for directed interaction networks Detecting community structures in networks	UvA ScP 904 room A1.10
April 24 14:00-16:00	Fengnan Gao Leiden University Stephanie van der Pas Leiden University	A quick survey in random graph models Stochastic block models	UvA ScP 904 room A1.10
May 1 15:00-16:00	Kolyan Ray Leiden University	Estimating Sparse Precision Matrix	UvA ScP 904 room A1.10
May 8 15:30-17:30	Gino B. Kpogbezan Leiden University Jarno Hertog University of Amsterdam	Variational Bayesian SEM for undirected Network recovery using external data Kernel-based regression	UvA ScP 904 room D1.116

*Academic year 2013-2014*
Date	Speaker	Title	Location
March 13 15:30	Leila Mohammadi Leiden University	A nonparametric view of network models	Leiden University room 312
April 4 15:00-16:00	Aad van der Vaart Leiden University Harry van Zanten University of Amsterdam	Stochastic block models Regression on graphs	UvA ScP 904 room A 1.06

Archive of Abstracts and Slides

Academic year 2016-2017

Nurzhan Nurushev: Oracle uncertainty quantification for biclustering model

Leiden, Friday, May 12, 2017

We study the problem of inference on the unknown parameter in the biclustering model by using the penalization method which originates from the empirical Bayes approach. The underlying biclustering structure is that the high-dimensional parameter consists of a few blocks of equal coordinates. The main inference problem is the uncertainty quantification (i.e., construction of a conference set for the unknown parameter), but on the way we solve the estimation problem as well. We pursue a novel local approach in that the procedure quality is characterized by a local quantity, the oracle rate, which is the best trade-off between the approximation error by a biclustering structure and the best performance for that approximating biclustering structure. The approach is also robust in that the additive errors in the model are not assumed to be independent (in fact, in general dependent) with some known distribution, but only satisfying certain mild exchangeable exponential moment conditions. We introduce the excessive bias restriction (EBR) under which we establish the local (oracle) confidence optimality of the proposed confidence ball. Adaptive minimax results (for the graphon estimation and posterior contraction problems) follow from our local results. The results for the stochastic block model follow, with implications for network modeling. [Joint work with E. Belitser.]

Peter Bloem: Network motif detection at scale

Leiden, April 7, 2017

Network motif analysis is a form of pattern mining on graphs. It searches for subgraphs that are unexpectedly frequent with respect to a null model. To compute the expected frequency of the subgraph, the search for motifs is normally repeated on as many as 1 000 random graphs sampled from the null model. This is an expensive operation that currently limits motif analysis to graphs of around 10 000 links. Using the minimum description length principle, we have developed an approximation that avoids the graph samples and computes motif significance efficiently, allowing us to perform motif detection on graphs with billions of links, using commodity hardware.

Loïc Schwaller: Exact Bayesian inference for off-line change-point detection in tree-structured graphical models

Leiden, March 17, 2017

We consider the problem of change-point detection in multivariate time-series. The multivariate distribution of the observations is supposed to follow a graphical model, whose graph and parameters are affected by abrupt changes throughout time. We demonstrate that it is possible to perform exact Bayesian inference whenever one considers a simple class of undirected graphs called spanning trees as possible structures. We are then able to integrate on the graph and segmentation spaces at the same time by combining classical dynamic programming with algebraic results pertaining to spanning trees. In particular, we show that quantities such as posterior distributions for change-points or posterior edge probabilities over time can efficiently be obtained. We illustrate our results on both synthetic and experimental data arising from biology and neuroscience.

Jarno Hartog: Nonparametric Bayesian label prediction on a graph

Leiden, March 17, 2017

I will present an implementation of a nonparametric Bayesian approach to solving binary classification problems on graphs. I consider a hierarchical Bayesian approach with a randomly scaled Gaussian prior.

Guus Regts: Approximation algorithms for graph polynomials and partition functions

The correlation decay method, pioneered by Weitz in 2006, is a method that yields efficient (polynomial time) deterministic approximation algorithms for computing partition functions of several statistical models. While the method yields deterministic algorithms it has a probabilistic flavour. In this talk I will sketch how this method works for the hardcore model, i.e., for counting independent sets in bounded degree graphs. After that I will discuss a different method pioneerd by Barvinok based on Taylor approximations of the logarithm of the partition function and on the location of zeros of the partition function. I will explain how this approach can give polynomial time approximation algorithms for computing several partition functions on bounded degree graphs.
This is based on joint work with Viresh Patel (UvA)

Marco Grzegorczyk: Bayesian inference of semi-mechanistic network models

A topical and challenging problem for statistics and machine learning is to infer the structure of complex systems of interacting units.In many scientific disciplines such systems are represented by interaction networks described by systems of differential equations. My presentation is about a novel semi-mechanistic Bayesian modelling approach for infering the structures and parameters of these interaction networks from data. The inference approach is based on gradient matching and a non-linear Bayesian regression model. My real.-world applications stem from the topical field of computational systems biology, where researchers aim to reconstruct the structure of biopathways or regulatory networks from postgenomic data. My focus is on investigating to which extent certain factors influence the network reconstruction accuracy. To this end, I compare not only (i) different methods for model selection, including various Bayesian information criteria and marginal likelihood approximation methods, but also (ii) different ways to approximate the gradients of the observed time series. Finally, I cross-compare the performance of the new method with a set of state-of-the art network reconstruction networks, such as Bayesian networks. Within the comparative evaluation studies I employ ANOVA schemes to disambiguate to which extents confounding factors impact on the network reconstruction accuracies.

Joris Mooij: Automating Causal Discovery and Prediction

The discovery of causal relationships from experimental data and the construction of causal theories to describe phenomena are fundamental pillars of the scientific method. How to reason effectively with causal models, how to learn these from data, and how to obtain causal predictions has been traditionally considered to be outside of the realm of statistics. Therefore, most empirical scientists still perform these tasks informally, without the help of mathematical tools and algorithms. This traditional informal way of causal inference does not scale, and this is becoming a serious bottleneck in the analysis of the outcomes of large-scale experiments nowadays. In this talk I will describe formal causal reasoning methods and algorithms that can help to automate the process of scientific discovery from data.

Stephanie van der Pas: Bayesian community detection

In the stochastic block model, nodes in a graph are partitioned into classes ('communities') and it is assumed that the probability of the presence of an edge between two nodes solely depends on their class labels. We are interested in recovering the class labels, and employ the Bayesian posterior mode for this purpose. We present results on weak consistency (where the fraction of misclassified nodes converges to zero) and strong consistency (where the number of misclassified nodes converges to zero) of the posterior mode , in the 'dense' regime where the probability of an edge occurring between two nodes remains bounded away from zero, and in the 'sparse' regime where this probability does go to zero as the number of nodes increases.

Academic year 2015-2016

Ervin Tánczos: Adaptive Sensing for Recovering Structured Sparse Sets

Consider the problem of recovering the support of a sparse signal, that is we are given an unknown s-sparse vector $x$, whose non-zero elements are $\mu>0$ and we are tasked with recovering the support of $x$. Suppose each coordinate of $x$ is measured independently with additive standard normal noise. In case the support can be any s-sparse set, we know that $\mu$ needs to scale as $\sqrt{\log n}$ for us to be able to reliably recover the support. However, in some practical settings the support set has a certain structure. For instance in gene-expression studies the signal support can be viewed a submatrix of the gene-expression matrix, or when searching for network anomalies the support set can be viewed as a star in the network graph. In such cases we might be able to recover the support of weaker signals. This question has been recently addressed by various authors. Now consider a setting where instead of measuring every coordinate of $x$ the same way, we can collect observations sequentially using our knowledge accumulated from previous observations. This setup is usually referred to as ``active learning" or ``adaptive sensing". We aim to characterize the difficulty of accurately recovering structured support sets using adaptive sensing, and also provide near optimal procedures for support recovery. In particular we are interested in the gains adaptive sensing provides over non-adaptive sensing in these situations. We consider two measurement models, namely coordinate-wise observations and compressive sensing. Our results show that adaptive sensing strategies can improve on non-adaptive ones both by better mitigating the effect of measurement noise, and capitalizing on structural information to a larger extent.

Moritz Schauer: Working with graphs in Julia

Julia is an emerging technical programming language, which has some properties which make it especially interesting for the implementation of Bayesian methods. In this talk I give an introduction into the graph-related functionality Julia provides. After a demonstration how to create and display graphs in Julia using the package Graphs.jl, I show how to perform Bayesian inference on a "smooth" function defined on a graph in Julia.

Fengnan Gao: On the Estimation of the Preferential Attachment Network Model

The preferential attachment (PA) network is a popular way of modeling the social networks, the collaboration networks and etc. The PA network model is an evolving network where new nodes keep coming in. When a new node comes in, it establishes only one connection with an existing node. The random choice on the existing node is via a multinormial distribution with probability weights based on a preferential function $f$ on the degrees. f is assumed apriori nondecreasing, which means the nodes with high degrees are more likely to get new connections, i.e. "the rich get richer". We proposed an estimator on f, that maps the natural numbers to the positive real line. We show, with techniques from branching process, our estimator is consistent. If $f$ is affine, meaning $f(k) = k + delta$, it is well known that such a model leads to a power-law degree distribution. We proposed a maximum likelihood estimator for delta and establish a central limit result on the MLE of delta.

Paulo Serra: Dimension Estimation using Random Connection Models

In statistics we often want to discover (sometimes impose) structure on observed data, and dimension plays a crucial role in this task. The setting that I will consider in this talk is the following: some high-dimensional data has been collected but it (potentially) lives in some lower dimensional space (this lower dimension is called the intrinsic dimension of the dataset); the objective is to estimate the intrinsic dimension of the high-dimensional dataset.
Why would we want to to this? Dimensionality reduction techniques (e.g., PCA, manifold learning) usually rely on knowledge about intrinsic dimension. Knowledge about dimension is also important to try to avoid the curse of dimensionality. From a computational perspective, the dimension of a dataset has impact in terms of the amount of space needed to store data (compressibility). The speed of algorithms is also commonly affected by the dimension of input data. One can also envision situations where we have access to some regression data, but the design points are unknown (this occurs, for example, in graphon estimation problems); the dimension of the design space has a large impact on the rate with which the regression function can be recuperated.
Our approach relies on having access to a certain graph: each vertex represents an obser- vation, and there is an edge between two vertices if the corresponding observations are close in some metric. We model this graph as a random connection model (a model from continuum percolation), and use this to propose estimators for the intrinsic dimension based on the dou- bling property of the Lebesgue measure. I will give some conditions under which the dimension can be estimated consistently, and some bounds on the probability of correctly recuperating an integer dimension. I will also show some numerical results and compare our estimators with some competing approaches from the literature.
This is joint work with Michel Mandjes.

Rui Castro (TU/e): Distribution-Free Detection of Structured Anomalies: Permutation and Rank-Based Scans

The scan statistic is by far the most popular method for anomaly detection, being popular in syndromic surveillance, signal and image processing and target detection based on sensor networks, among other applications. The use of scan statistics in such settings yields an hypothesis testing procedure, where the null hypothesis corresponds to the absence of anomalous behavior. If the null distribution is known calibration of such tests is relatively easy, as it can be done by Monte-Carlo simulation. However, when the null distribution is unknown the story is less straightforward. We investigate two procedures: (i) calibration by permutation and (ii) a rank-based scan test, which is distribution-free and less sensitive to outliers. A further advantage of the rank-scan test is that it requires only a one-time calibration for a given data size making it computationally much more appealing than the permutation-based test. In both cases, we quantify the performance loss with respect to an oracle scan test that knows the null distribution. We show that using one of these calibration procedures results in only a very small loss of power in the context of a natural exponential family. This includes for instance the classical normal location model, popular in signal processing, and the Poisson model, popular in syndromic surveillance. Numerical experiments further support our theory and results (joint work with Ery Arias-Castro, Meng Wang (UCSD) and Ervin Tánczos (TU/e)).

Koen van Oosten: Achieving Optimal Misclassification Proportion in Stochastic Block Model

Community detection is a fundamental statistical problem in network data analysis. Many algorithms have been proposed to tackle this problem. Most of these algorithms are not guaranteed to achieve the statistical optimality of the problem, while procedures that achieve information theoretic limits for general parameter spaces are not computationally tractable. In my talk I present a computationally feasible two-stage method that achieves optimal statistical performance in misclassification proportion for stochastic block model under weak regularity conditions. This two-stage procedure consists of a refinement stage motivated by penalized local maximum likelihood estimation. This stage can take a wide range of weakly consistent community detection procedures as initializer, to which it applies and outputs a community assignment that achieves optimal misclassification proportion with high probability.

Wessel van Wieringen: A tale of two networks: two GGMs and their differences

The two-sample problem is addressed from the perspective of Gaussian graphical models (GGMs), in exploratory and confirmatory fashion. The former amounts to the estimation of a precision matrix for each group. First, this is done group-wise by means of penalized maximum likelihood with an algebraically proper l2-penalty, for which an analytic expression of the estimator and its properties are derived. To link the groups the ridge penalty is then augmented with an fused term, which penalizes the difference between the group precisions. The confirmatory part concentrates on the situation in which partial correlations are systematically smaller/larger (in an absolute sense) in one of the groups. Data in both groups again are assumed to follow a GGM but now their partial correlations are proportional, differing by a multiplier (common to all partial correlations). The multiplier reflects the overall strength of the conditional dependencies. As before model parameters are estimated by means of penalized maximum likelihood, now using a ridge-like penalty. A permutation scheme to test for the multiplier differing from zero is proposed. A re-analysis of publicly available gene expression data on the Hedgehog pathway in normal and cancer prostate tissue combines both strategies to show its activation in the disease group.

Pariya Behrouzi: Detecting Epistatic Selection in the Genome of RILs via a latent Gaussian Copula Graphical Model

Recombinant Inbred Lines (RILs) derived from divergent parental lines can display extensive segregation distortion and long-range linkage disequilibrium (LD) between distant loci on same or different chromosomes. These genomic signatures are consistent with epistatic selection having acted on entire networks of interacting parental alleles during inbreeding. The reconstruction of these interaction networks from observations of pair-wise marker-marker correlations or pair-wise genotype frequency distortions is challenging as multiple testing approaches are under-powered and true long-range LD is difficult to distinguish from drift, particularly in small RIL panels. Here we develop an efficient method for reconstructing an underlying network of genomic signatures of high-dimensional epistatic selection from multi-locus genotype data. The network captures the conditionally dependent short- and long-range LD structure of RIL genomes and thus reveals aberrant marker-marker associations that are due to epistatic selection rather than gametic linkage. The network estimation relies on penalized Gaussian copula graphical models, which accounts for the large number of markers p and the small number of individuals n. We overcome the p >> n problem by using a penalized maximum likelihood technique that imposes an l1 penalty on the precision matrix of the latent process inside the EM estimation. A multi-core implementation of our algorithm makes it feasible to estimate the graph in high-dimensions (max markers approximately 3000). We demonstrate the efficiency of the proposed method on simulated datasets as well as on genotyping data in A.thaliana and Maize.

Academic year 2014-2015

Chao Gao: Rate Optimal Graphon Estimation

Network analysis is becoming one of the most active research areas in statistics. Significant advances have been made recently on developing theories, methodologies and algorithms for analyzing networks. However, there has been little fundamental study on optimal estimation. In this paper, we establish optimal rate of convergence for graphon estimation. Link: http://arxiv.org/abs/1410.5837

Botond Szabo: Detecting community structures in networks

I will review different algorithms for community detection described in: NEWMAN, M. E. J. (2004). Detecting community structure in networks. Eur. Phys. J. B 38 321-330. NEWMAN, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104, 19. MR2282139 MR2282139 (2007j:82115)

Moritz Schauer: A graphical perspective on Gauss-Markov process priors

In this short talk I look at the connections between Gauss-Markov process priors on a line and Gaussian Markov Random fields on a tree via the midpoint displacement procedure. The Markov-property of the prior corresponds to a sparsity constraint for the prior precision on the tree which allows to solve the Gaussian inverse problem under quasi-linear time and space constraints using a divide and conquer algorithm. This leads to the notion of computationally desirable sparsity properties connecting Gramian matrix stemming from an Gaussian inverse problem and the prior precision matrix.

Johannes Schmiedt-Hieber: High-dimensional covariance estimation

I am going to talk about the paper: Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti and Bin Yu High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence, EJS, 2011

Bartek Knapik: Point process modelling for directed interaction networks

I will present the paper: Perry, Patrick O.; Wolfe, Patrick J. Point process modelling for directed interaction networks. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 (2013), no. 5, 821-849

Fengnan Gao: A quick survey in random graph models

We will review several important random graph models, their definitions and important results on them. The models include Erdős–Rényi model, configuration model and preferential attachment model. We will focus on preferential attachment model. Most of the presentation is based on Remco van der Hofstad's lecture notes http://www.win.tue.nl/~rhofstad/NotesRGCN.pdf

Stephanie van der Pas: Stochastic block models

I will review the paper: Antoine Channarond, Jean-Jacques Daudin, and Stéphane Robin. Classification and estimation in the Stochastic Blockmodel based on the empirical degrees. Electron. J. Statist. Volume 6 (2012), 2574-2601. Link: http://projecteuclid.org/euclid.ejs/1357913089

Kolyan Ray: Estimating Sparse Precision Matrix

I will present the paper: Cai, Tony, Weidong Liu and Harrison H. Zhou. Estimating Sparse Precision Matrix: Optimal Rates of Convergence and Adaptive Estimation Link: http://arxiv.org/abs/1212.2882

Gino B. Kpogbezan: Variational Bayesian SEM for undirected Network recovery using external data

Recently we developed a Bayesian structural equation model (SEM) framework with shrinkage priors for undirected network reconstruction. It was shown that Bayesian SEM in combination with variational Bayes is particularly attractive as it performs well, is computationally very fast and a flexible framework. A posteriori variable selection is feasible in our Bayesian SEM and so is the use of shrinkage priors. These shrinkage priors depend on all regression equations allowing borrowing of information across equations and improve inference when the number of features is large. An empirical Bayes procedure is used to estimate our hyperparameters. We also showed in simulations that our approach can outperform popular (sparse) methods. Here, we focus on addressing the problem of incorporating external data and/or prior information into network inference. In many settings information regarding network connectivity is often available. It is then natural to take such information into account during network reconstruction. Based on Bayesian SEM we propose a new model that focuses on the use of external data. It performs better than that of our Bayesian SEM when the external information is relevant, and as good when it is not.

Jarno Hertog: Kernel-based regression

I will discuss the basic kernel approach to regression in the context of graphs and present an number of methods to construct such kernels as in section 8.4 of Statistical Analysis of Network Data by Eric D. Kolaczyk.