Ciclo de Palestras – Segundo Semestre de 2020

Coordenação: Professora Kelly Cristina Mota Gonçalves

Devido à pandemia, as palestras ocorrerão no ambiente virtual gratuito Google Meet (https://meet.google.com/ruv-ruxx-ehg) durante os próximos meses. As palestras ocorrerão às quartas-feiras às 15h30, a menos de algumas exceções devidamente indicadas. A sala será aberta sempre 10 minutos antes do início de cada sessão.

Lista completa (palestras previstas para datas futuras podem sofrer alterações)

This work was motivated by a car insurance study, comprising policies registered in Portugal mainland from 2011 to 2013, and involving some particularities, namely missing values and excess of zeros in the data set. It aims to analyse how claim frequency is influenced by policy risk factors in a car insurance application. Hence, risk profiles can be defined in order to apply adequate insurance premiums for policyholders and reduce monetary losses in insurance companies. The methodology is based on Structured Additive Models by using Bayesian approach via Markov chain Monte Carlo methods. Model selection suggested better fitting for zero-inflated negative binomial models, which were used for estimation of actuarial quantities of interest.

Joint work with João Góis.

Degradation data are considered to make reliability assessments in highly reliable systems. The class of general path models is a popular tool to approach degradation data. In this class of models, the random effects correlate the degradation measures in each device. Random effects are interpreted in terms of the degradation rates, which facilitates the specification of their prior distribution. The usual approaches assume that the devices under test come from a homogeneous population. This assumption is strong, mainly, if the variability in the manufacturing process is high or there are no guarantees that the devices work on similar conditions. To account for heterogeneous degradation data, we develop semi-parametric degradation models based on the Dirichlet process mixture of both, normal and skew-normal distributions. The proposed model accommodates different shapes for the degradation rate distribution and also allows the estimation of the number of populations involved in the study. We prove that the proposed model also imposes heterogeneity in the lifetime data. We introduce a method to build the prior distributions which adapt previous approaches to the context in which mixture models fit latent variables. We carry out simulation studies and data analysis to show the flexibility of the proposed model in modeling skewness, heavy tail and multi-modal behavior of the random effects. Results show that the proposed models are competitive approaches to analyze degradation data.

Joint work with Cristiano C. Santos.

Maximum analysis consists of modeling the maximums of a data set by considering a specific distribution. Extreme value theory (EVT) shows that, for a sufficiently large block size, the maxima distribution is approximated by the generalized extreme value (GEV) distribution. Under EVT, it is important to observe the high quantiles of the distribution. In this sense, quantile regression techniques fit the data analysis of maxima by using the GEV distribution. In this context, this work presents the quantile regression extension for the GEV distribution. In addition, a time?varying quantile regression model is presented, and the important properties of this approach are displayed. The parameter estimation of these new models is carried out under the Bayesian paradigm. The results of the temperature data and river quota application show the advantage of using this model, which allows us to estimate directly the quantiles as a function of the covariates. This shows which of them influences the occurrence of extreme temperature and the magnitude of this influence.
Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. In this talk, we show a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple outputs, and we define noncrossing quan- tiles in every directional quantile model. We define a Markov chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two datasets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.

* joint work with Thomas Kneib (University of Goettingen)

Assista à palestra no Youtube

This work investigates the causes of high rates (up to 88%) of the cesarean section (CS) in hospitals in Brazil. Evidence indicates that rates over 10-15% are correlated with maternal death, morbidity and near death. The usual approach to relate factors and outcome in the birth network is based on regression that do not allow for cause-effect inference. I propose a novel approach based on Bayesian networks to capture both non-linearities and complex cause-effect relations. The proposed network integrate both the knowledge from experts to elicit the graph structure and data of 12 hospitals (7200 women) to estimate model parameters. The theoretical birth network, even though described in papers in the area of public health, has not been mathematically constructed and confirmed by data. In particular, a quality improvement intervention called “Adequate Birth” (PPA) will be analyzed. The PPA was a pioneer project to reshape the birth care system in Brazil. The main results presented are (i) comprehensive guidelines to decrease CS rates depending on the estimated Bayesian network, (ii) integration of factors in a full model which will be tested using data obtained from the PPA intervention, (iii) query analysis based on changes in the system, (iv) a tool for policymakers aiming to optimize the cost-effectiveness of future interventions.

Joint work with Jacqueline T. Alves (Institute for Healthcare Improvement/ Africa Region), Maria do Carmo Leal (Fiocruz), Rosa Domingues (Fiocruz) and Tatiana H. Leite (Fiocruz)

Assista à palestra no Youtube

Even though there is substantial literature on studies that pool survey data, it is still not clear which are the most efficient methodologies for pooling data from different surveys. For example, it is important to know whether the estimates from the different surveys involved should be given equal weights in the calculation of the combined statistics or not. If they are not given equal importance, then it should be clear how they should be weighted and why. In this paper, current and proposed methods considered to combine survey data are evaluated through simulation, in the context of simple random sampling, stratified random sampling and two stage cluster random sampling from finite populations generated from super-population models. Simulation results suggest superpopulation variance does not influence the choice of weighting method. However, the population size appear to influence this choice. Combining samples improved the precision of estimates regardless of the weighting method used for all sampling techniques.

*Joint work with Loveness Nyaradzo Dzikiti and Brendan Girdler-Brown from the University of Pretoria (South Africa)

Assista à palestra no Youtube

In this work we consider some stationary and nonstationary time series and multilevel models to represent longitudinal Item Response Theory (IRT) data. We developed a Bayesian inference framework, which includes parameter estimation, model fit assessment and model comparison tools, through MCMC algorithms. Simulation studies are conducted in order to measure the parameter recovery. All computational implementations are made through the WinBUGS program, using the R2WinBUGS package, from the R program. A real data analysis, concerning a longitudinal cognitive study of Mathematics achievement, conducted by the Federal Brazilian government, is performed.

*joint work with Jean-Paul Fox, University of Twente and Dalton F. Andrade, Universidade Federal de Santa Catarina

Assista à palestra no Youtube

When analyzing two-way data tables, with genotypes in the rows and environments in the columns, it is frequent to observe differential responses of genotypes across environments. This phenomenon is known as genotype by environment interaction (GEI) and can be defined by the change of genetic ranking of genotypes with the environment (e.g. in plant sciences, a genotype that is superior at well-watered conditions may yield poorly under dry conditions). The GEI can be expressed either as crossovers, when two different genotypes change in rank order of performance when evaluated in different environments, or inconsistent responses of some genotypes across environments without changes in rank order. One step further from the GEI can be made by considering the whole genetic information and analyze the QTL (quantitative trait loci) by environment interaction (QEI). To structure and better understand these interactions, the use of modern statistical methods is required. In this talk, I will present generalizations of two fixed effects models: the additive main effects and multiplicative interaction (AMMI) model, and the genotype plus genotype by environment interaction (GGE) model. These generalizations are the robust AMMI and robust GGE models, which outperform their classical counterparts when outlying observations are present in the data. I will present model performance and comparison in terms of QTL detection and QEI interpretation, by considering applications to simulated and real data sets.
Among the many disparities for which Brazil is known is the difference in performance across students who attend the three administrative levels of Brazilian public schools: federal, state and municipal. Our main goal is to investigate whether student performance in the Brazilian Mathematical Olympics for Public Schools is associated with school administrative level and student gender. For this, we propose a hurdle hierarchical beta model for the scores of students who took the examination in the second phase of these Olympics, in 2013. The mean of the beta model incorporates fixed and random effects at the student and school levels. We explore different distributions for the random school effect. As the posterior distributions of some fixed effects change in the presence, and distribution, of the random school effects, we also explore models that constrain random school effects to the orthogonal complement of the fixed effects. We conclude that male students perform slightly better than female students and that, on average, federal schools perform substantially better than state or municipal schools. However, some of the best municipal and state schools perform as well as some federal schools. We hypothesize that this is due to individual teachers who successfully motivate and prepare their students to perform well in the mathematical Olympics.
In database management, record linkage aims to identify multiple records that correspond to the same individual. This task can be treated as a clustering problem, in which a latent entity is associated with one or more noisy database records. However, in contrast to traditional clustering applications, a large number of clusters with a few observations per cluster is expected in this context. In this paper, we introduce a new class of prior distributions based on allelic partitions that is specially suited for the small cluster setting of record linkage. Our approach makes it straightforward to introduce prior information about the cluster size distribution at different scales, and naturally enforces sublinear growth of the maximum cluster size – known as the microclustering property. We evaluate the performance of our proposed class of priors using three official statistics data sets and show that our models provide competitive results compared to state-of-the art microclustering models in the record linkage literature.
Dengue-fever, zika, and chikungunya are arboviral infection diseases transmitted by two vectors: Aedes aegypti and Aedes albopictus. During April 2016, the city of Rio de Janeiro experienced the peak of the first joint epidemic of the three diseases. As these diseases are transmitted by the same vectors, and the notified cases are either confirmed by laboratory exam or clinical-epidemiological criteria we propose a model that allows for uncertainty in the allocation of the number of cases per disease per borough. We propose a Poisson model for the total number of cases of arboviral infection diseases and, conditioned on the total number of cases, we assume a multinomial model for the number of cases of the three diseases.

We discuss different parametrizations of the log-relative risk of the total number of cases and the parameters of the multinomial distribution. We have available the number of cases across the n = 160 boroughs of the city, the percentage of green area of the borough, a social-economic index and the population density. Inference is performed under the Bayesian framework. Our analysis suggests that as the percentage of green area increases the relative risk for the total number of cases decreases. The odds of a borough having chikungunya instead of dengue decreases as the social index increases, whereas the odds of having zika instead of chikungunya increases with the social index. The odds ratio of zika or chikungunya with respect to dengue fever is not affected by the percentage of the green area of the borough. This is joint work with Laís P. Freitas, Marília S. Carvalho, and Oswaldo Cruz (Oswaldo Cruz Foundation, Brazil).

Assista à palestra no Youtube

Likelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional problems, but suffer a curse of dimensionality in the number of model parameters. We will strive to provide a gentle overview of some of the state of the art approaches in this area.