SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Gionis Aristides) "

Sökning: WFRF:(Gionis Aristides)

  • Resultat 1-50 av 60
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Adriaens, Florian, et al. (författare)
  • Diameter Minimization by Shortcutting with Degree Constraints
  • 2022
  • Ingår i: 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 843-848
  • Konferensbidrag (refereegranskat)abstract
    • We consider the problem of adding a fixed number of new edges to an undirected graph in order to minimize the diameter of the augmented graph, and under the constraint that the number of edges added for each vertex is bounded by an integer. The problem is motivated by network-design applications, where we want to minimize the worst case communication in the network without excessively increasing the degree of any single vertex, so as to avoid additional overload. We present three algorithms for this task, each with their own merits. The special case of a matching augmentation -when every vertex can be incident to at most one new edge- is of particular interest, for which we show an inapproximability result, and provide bounds on the smallest achievable diameter when these edges are added to a path. Finally, we empirically evaluate and compare our algorithms on several real-life networks of varying types.
  •  
2.
  • Adriaens, Florian, et al. (författare)
  • Minimizing hitting time between disparate groups with shortcut edges
  • 2023
  • Ingår i: KDD 2023. - : Association for Computing Machinery (ACM). ; , s. 1-10
  • Konferensbidrag (refereegranskat)abstract
    • Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. Examples include polarized communities in social networks, antagonistic content in video-sharing or news-feed platforms, etc. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used mechanism for increasing the network connectivity is to add edge shortcuts between pairs of nodes. In many applications of interest, edge shortcuts typically translate to recommendations, e.g., what video to watch, or what news article to read next. The problem of reducing structural bias or segregation via edge shortcuts has recently been studied in the literature, and random walks have been an essential tool for modeling navigation and connectivity in the underlying networks. Existing methods, however, either do not offer approximation guarantees, or engineer the objective so that it satisfies certain desirable properties that simplify the optimization task. In this paper we address the problem of adding a given number of shortcut edges in the network so as to directly minimize the average hitting time and the maximum hitting time between two disparate groups. The objectives we study are more natural than objectives considered earlier in the literature (e.g., maximizing hitting-time reduction) and the optimization task is significantly more challenging. Our algorithm for minimizing average hitting time is a greedy bicriteria that relies on supermodularity. In contrast, maximum hitting time is not supermodular. Despite, we develop an approximation algorithm for that objective as well, by leveraging connections with average hitting time and the asymmetric k-center problem.
  •  
3.
  • Agarwal, D., et al. (författare)
  • Foreword from the Programme Chairs
  • 2022
  • Ingår i: 31st ACM World Wide Web Conference, WWW 2022. - : Association for Computing Machinery, Inc.
  • Konferensbidrag (refereegranskat)
  •  
4.
  • Anagnostopoulos, A., et al. (författare)
  • Collaborative procrastination
  • 2020
  • Ingår i: Leibniz International Proceedings in Informatics, LIPIcs. - : Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing.
  • Konferensbidrag (refereegranskat)abstract
    • The problem of inconsistent planning in decision making, which leads to undesirable effects such as procrastination, has been studied in the behavioral-economics literature, and more recently in the context of computational behavioral models. Individuals, however, do not function in isolation, and successful projects most often rely on team work. Team performance does not depend only on the skills of the individual team members, but also on other collective factors, such as team spirit and cohesion. It is not an uncommon situation (for instance, experienced by the authors while working on this paper) that a hard-working individual has the capacity to give a good example to her team-mates and motivate them to work harder. In this paper we adopt the model of Kleinberg and Oren (EC'14) on time-inconsistent planning, and extend it to account for the influence of procrastination within the members of a team. Our first contribution is to model collaborative work so that the relative progress of the team members, with respect to their respective subtasks, motivates (or discourages) them to work harder. We compare the total cost of completing a team project when the team members communicate with each other about their progress, with the corresponding cost when they work in isolation. Our main result is a tight bound on the ratio of these two costs, under mild assumptions. We also show that communication can either increase or decrease the total cost. We also consider the problem of assigning subtasks to team members, with the objective of minimizing the negative effects of collaborative procrastination. We show that whereas a simple problem of forming teams of two members can be solved in polynomial time, the problem of assigning n tasks to n agents is NP-hard.
  •  
5.
  • Andrienko, G., et al. (författare)
  • (So) Big Data and the transformation of the city
  • 2020
  • Ingår i: International Journal of Data Science and Analytics. - : Springer. - 2364-415X .- 2364-4168.
  • Tidskriftsartikel (refereegranskat)abstract
    • The exponential increase in the availability of large-scale mobility data has fueled the vision of smart cities that will transform our lives. The truth is that we have just scratched the surface of the research challenges that should be tackled in order to make this vision a reality. Consequently, there is an increasing interest among different research communities (ranging from civil engineering to computer science) and industrial stakeholders in building knowledge discovery pipelines over such data sources. At the same time, this widespread data availability also raises privacy issues that must be considered by both industrial and academic stakeholders. In this paper, we provide a wide perspective on the role that big data have in reshaping cities. The paper covers the main aspects of urban data analytics, focusing on privacy issues, algorithms, applications and services, and georeferenced data from social media. In discussing these aspects, we leverage, as concrete examples and case studies of urban data science tools, the results obtained in the “City of Citizens” thematic area of the Horizon 2020 SoBigData initiative, which includes a virtual research environment with mobility datasets and urban analytics methods developed by several institutions around Europe. We conclude the paper outlining the main research challenges that urban data science has yet to address in order to help make the smart city vision a reality.
  •  
6.
  • Aslay, Cigdem, et al. (författare)
  • Mining Frequent Patterns in Evolving Graphs
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous appli- cations ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem. In this paper, we initiate the study of approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent k-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of k-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and emprically demonstrate that the proposed algorithms generate high-quality results compared to baselines.
  •  
7.
  • Aslay, Cigdem, et al. (författare)
  • Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks
  • 2021
  • Ingår i: 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 1152-1163
  • Konferensbidrag (refereegranskat)abstract
    • Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
  •  
8.
  •  
9.
  •  
10.
  • Ciaperoni, Martino, et al. (författare)
  • Concise and interpretable multi-label rule sets
  • 2022
  • Ingår i: 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM). - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 71-80
  • Konferensbidrag (refereegranskat)abstract
    • Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules, requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem we propose a 2-approximation algorithm, which relies on a novel technique to sample high-quality rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation, which indicates that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.
  •  
11.
  • Ciaperoni, Martino, et al. (författare)
  • Concise and interpretable multi-label rule sets
  • 2023
  • Ingår i: Knowledge and Information Systems. - : Springer Nature. - 0219-1377 .- 0219-3116. ; 65:12, s. 5657-5694
  • Tidskriftsartikel (refereegranskat)abstract
    • Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple “if-then” rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules, requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem, we propose a 2-approximation algorithm, which circumvents the exponential-size search space of rules using a novel technique to sample highly discriminative and diverse rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation and a case study, which indicate that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.
  •  
12.
  • Ciaperoni, M., et al. (författare)
  • SIEVE : A Space-Efficient Algorithm for Viterbi Decoding
  • 2022
  • Ingår i: SIGMOD '22. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 1136-1145
  • Konferensbidrag (refereegranskat)abstract
    • Can we get speech recognition tools to work on limited-memory devices? The Viterbi algorithm is a classic dynamic programming (DP) solution used to find the most likely sequence of hidden states in a Hidden Markov Model (HMM). While the algorithm finds universal application ranging from communication systems to speech recognition to bioinformatics, its scalability has been scarcely addressed, stranding it to a space complexity that grows with the number of observations. In this paper, we propose SIEVE (Space Efficient Viterbi), a reformulation of the Viterbi algorithm that eliminates its space-complexity dependence on the number of observations to be explained. SIEVE discards and recomputes parts of the DP solution for the sake of space efficiency, in divide-and-conquer fashion, without incurring a time-complexity overhead. Our thorough experimental evaluation shows that SIEVE is highly effective in reducing the memory usage compared to the classic Viterbi algorithm, while avoiding the runtime overhead of a naïve space-efficient solution.
  •  
13.
  • Ciaperoni, M., et al. (författare)
  • Workload-Aware Materialization of Junction Trees
  • 2022
  • Ingår i: Advances in Database Technology - EDBT. - : OpenProceedings.org. ; , s. 65-77
  • Konferensbidrag (refereegranskat)abstract
    • Bayesian networks are popular probabilistic models that capture the conditional dependencies among a set of variables. Inference in Bayesian networks is a fundamental task for answering probabilistic queries over a subset of variables in the data. However, exact inference in Bayesian networks is NP-hard, which has prompted the development of many practical inference methods. In this paper, we focus on improving the performance of the junction-tree algorithm, a well-known method for exact inference in Bayesian networks. In particular, we seek to leverage information in the workload of probabilistic queries to obtain an optimal workload-aware materialization of junction trees, with the aim to accelerate the processing of inference queries. We devise an optimal pseudo-polynomial algorithm to tackle this problem and discuss approximation schemes. Compared to state-of-the-art approaches for efficient processing of inference queries via junction trees, our methods are the first to exploit the information provided in query workloads. Our experimentation on several real-world Bayesian networks confirms the effectiveness of our techniques in speeding-up query processing. 
  •  
14.
  • Cinus, Federico, et al. (författare)
  • Rebalancing Social Feed to Minimize Polarization and Disagreement
  • 2023
  • Ingår i: CIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. - : Association for Computing Machinery (ACM). ; , s. 369-378
  • Konferensbidrag (refereegranskat)abstract
    • Social media have great potential for enabling public discourse on important societal issues. However, adverse effects, such as polarization and echo chambers, greatly impact the benefits of social media and call for algorithms that mitigate these effects. In this paper, we propose a novel problem formulation aimed at slightly nudging users' social feeds in order to strike a balance between relevance and diversity, thus mitigating the emergence of polarization, without lowering the quality of the feed. Our approach is based on re-weighting the relative importance of the accounts that a user follows, so as to calibrate the frequency with which the content produced by various accounts is shown to the user. We analyze the convexity properties of the problem, demonstrating the non-matrix convexity of the objective function and the convexity of the feasible set. To efficiently address the problem, we develop a scalable algorithm based on projected gradient descent. We also prove that our problem statement is a proper generalization of the undirected-case problem so that our method can also be adopted for undirected social networks. As a baseline for comparison in the undirected case, we develop a semidefinite programming approach, which provides the optimal solution. Through extensive experiments on synthetic and real-world datasets, we validate the effectiveness of our approach, which outperforms non-trivial baselines, underscoring its ability to foster healthier and more cohesive online communities.
  •  
15.
  • Coupette, Corinna, et al. (författare)
  • Reducing Exposure to Harmful Content via Graph Rewiring
  • 2023
  • Ingår i: KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. - : Association for Computing Machinery (ACM). ; , s. 323-334
  • Konferensbidrag (refereegranskat)abstract
    • Most media content consumed today is provided by digital platforms that aggregate input from diverse sources, where access to information is mediated by recommendation algorithms. One principal challenge in this context is dealing with content that is considered harmful. Striking a balance between competing stakeholder interests, rather than block harmful content altogether, one approach is to minimize the exposure to such content that is induced specifically by algorithmic recommendations. Hence, modeling media items and recommendations as a directed graph, we study the problem of reducing the exposure to harmful content via edge rewiring. We formalize this problem using absorbing random walks, and prove that it is NP-hard and NP-hard to approximate to within an additive error, while under realistic assumptions, the greedy method yields a (1-1/e)-approximation. Thus, we introduce Gamine, a fast greedy algorithm that can reduce the exposure to harmful content with or without quality constraints on recommendations. By performing just 100 rewirings on YouTube graphs with several hundred thousand edges, Gamine reduces the initial exposure by 50%, while ensuring that its recommendations are at most 5% less relevant than the original recommendations. Through extensive experiments on synthetic data and real-world data from video recommendation and news feed applications, we confirm the effectiveness, robustness, and efficiency of Gamine in practice.
  •  
16.
  • Garcia-Pueyo, Lluis, et al. (författare)
  • Integrity 2020 : Integrity in Social Networks and Media
  • 2020
  • Ingår i: Proceedings of the 13th international conference on web search and data mining (WSDM '20). - New York, NY, USA : ASSOC COMPUTING MACHINERY. ; , s. 905-906
  • Konferensbidrag (refereegranskat)abstract
    • The first Workshop on Integrity in Social Networks and Media is held in conjunction with the 13th ACM Conference on Web Search and Data Mining (WSDM) in Houston, Texas, USA. The goal of the workshop is to bring together researchers and practitioners to discuss content and interaction integrity challenges in social networks and social media platforms.
  •  
17.
  • Garcia-Pueyo, Lluís, et al. (författare)
  • Integrity 2024: Integrity in Social Networks and Media
  • 2024
  • Ingår i: WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining. - : Association for Computing Machinery, Inc. ; , s. 1212-1213
  • Konferensbidrag (refereegranskat)abstract
    • Integrity 2024 is the fifth edition of the Workshop on Integrity in Social Networks and Media, held in conjunction with the ACM Conference on Web Search and Data Mining (WSDM) since the 2020 edition [1-4]. The goal of the workshop is to bring together academic and industry researchers working on integrity, fairness, trust and safety in social networks to discuss the most pressing risks and cutting-edge technologies to reliably measure and mitigate them. The event consists of invited talks from academic experts and industry leaders as well as peer-reviewed papers and posters through an open call-for-papers.
  •  
18.
  • Gionis, Aristides, et al. (författare)
  • Mining Temporal Networks
  • 2024
  • Ingår i: WWW 2024 Companion - Companion Proceedings of the ACM Web Conference. - : Association for Computing Machinery (ACM). ; , s. 1260-1263
  • Konferensbidrag (refereegranskat)abstract
    • In World Wide Web (WWW) systems, networks (or graphs) serve as a fundamental tool for representing, analyzing, and understanding linked data, providing significant insights into the underlying systems. Naturally, most real-world systems have inherent temporal information, e.g., interactions in social networks occur at specific moments in time and last for a certain period. Temporal networks, i.e., network data modeling temporal information, enable novel and fundamental discoveries about the underlying systems they model, otherwise not captured by static networks that ignore such temporal information. In this tutorial, we present state-of-the-art models and algorithmic techniques for mining temporal networks that can provide precious insights into a plethora of web-related applications. We present how temporal networks can be used to extract novel information, especially in web-related network data, and highlight the challenges that arise when modeling temporal information compared to traditional static network-based approaches. We first overview different temporal network models. We then show how such powerful models can be leveraged to extract novel insights through suitable mining primitives. In particular, we present recent advances addressing most foundational problems for temporal network mining—ranging from the computation of temporal centrality measures, temporal motif counting, and temporal communities to bursty events and anomaly detection.
  •  
19.
  • Jin, Yifei, et al. (författare)
  • Learning Cellular Coverage from Real Network Configurations using GNNs
  • 2023
  • Ingår i: 2023 IEEE 97th Vehicular Technology Conference, VTC 2023-Spring - Proceedings. - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • Cellular coverage quality estimation has been a critical task for self-organized networks. In real-world scenarios, deep-learning-powered coverage quality estimation methods cannot scale up to large areas due to little ground truth can be provided during network design & optimization. In addition, they fall short in producing expressive embeddings to adequately capture the variations of the cells' configurations. To deal with this challenge, we formulate the task in a graph representation and so that we can apply state-of-the-art graph neural networks, that show exemplary performance. We propose a novel training framework that can both produce quality cell configuration embeddings for estimating multiple KPIs, while we show it is capable of generalising to large (area-wide) scenarios given very few labeled cells. We show that our framework yields comparable accuracy with models that have been trained using massively labeled samples.
  •  
20.
  • Jin, Yifei, et al. (författare)
  • Open World Learning Graph Convolution for Latency Estimation in Routing Networks
  • 2022
  • Ingår i: 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN). - : Institute of Electrical and Electronics Engineers (IEEE).
  • Konferensbidrag (refereegranskat)abstract
    • Accurate routing network status estimation is a key component in Software Defined Networking. However, existing deep-learning-based methods for modeling network routing are not able to extrapolate towards unseen feature distributions. Nor are they able to handle scaled and drifted network attributes in test sets that include open-world inputs. To deal with these challenges, we propose a novel approach for modeling network routing, using Graph Neural Networks. Our method can also be used for network-latency estimation. Supported by a domainknowledge-assisted graph formulation, our model shares a stable performance across different network sizes and configurations of routing networks, while at the same time being able to extrapolate towards unseen sizes, configurations, and user behavior. We show that our model outperforms most conventional deep-learningbased models, in terms of prediction accuracy, computational resources, inference speed, as well as ability to generalize towards open-world input.
  •  
21.
  • Kalamatianos, Georgios, 1992- (författare)
  • Diversified Retrieval of Spatial Data with Context
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The abundance and ubiquity of spatial datasets necessitates their effective and efficient retrieval. For instance, on the web, there are datasets with GIS objects or POIs (e.g. Spatialhadoop datasets), datasets with geo-tagged photographs (e.g. Flickr), online social networks (e.g. Facebook), and semantic knowledge graphs (e.g. Yago, DBpedia). Various spatial keyword search paradigms have been proposed that facilitate their easy retrieval by liberating users from understanding the database schemas (e.g. relational, NoSQL, RDF) and query languages (e.g. SQL, SPARQL). In a spatial keyword search paradigm the user inputs a set of keywords and a query location. The results consist of entities in geographical proximity to the query location and relevant to the query keywords. However, such results considering only relevance may include entities that are very similar to each other, compromising the query effectiveness by becoming overwhelming.In view of this limitation, we propose contextual and spatial diversification by facilitating the retrieval of spatial entities with diverse characteristics and directions with respect to the query location. We formally define the problem that combines relevance and diversity and prove that finding the optimal solutions is an NP-hard problem. Thus, we propose two greedy approximate algorithms with guaranteed quality.Furthermore, we investigate the proportional retrieval of spatial data. We argue that objects with similar context and nearby locations should be proportionally represented in the selection as this can represent insightful information to the users about the whole population of candidate objects. Proportionality dictates the pairwise comparison of all retrieved objects and hence bears a high computational cost. To address this, we propose novel algorithms which greatly reduce the cost of proportional object selection in practice. In addition, we propose pre-processing, pruning and approximate computation techniques which, combined, reduce the computational cost of the algorithms even further.Extensive empirical studies on two real datasets show that our proposed algorithms can be very fast and give timely results of excellent approximation quality. User evaluation verifies that users prefer results based on proportionality then based on diversification and lastly on relevance.
  •  
22.
  • Karlsson, Isak, et al. (författare)
  • Explainable time series tweaking via irreversible and reversible temporal transformations
  • 2018
  • Ingår i: 2018 IEEE International Conference on Data Mining (ICDM). - : IEEE. - 9781538691601 - 9781538691595 ; , s. 207-216
  • Konferensbidrag (refereegranskat)abstract
    • Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.
  •  
23.
  • Karlsson, Isak, et al. (författare)
  • Locally and globally explainable time series tweaking
  • 2020
  • Ingår i: Knowledge and Information Systems. - : Springer Science and Business Media LLC. - 0219-1377 .- 0219-3116. ; 62:5, s. 1671-1700
  • Tidskriftsartikel (refereegranskat)abstract
    • Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP -hard, and focus on three instantiations of the problem using global and local transformations. In the former case, we investigate the k-nearest neighbor classifier and provide an algorithmic solution to the global time series tweaking problem. In the latter case, we investigate the random shapelet forest classifier and focus on two instantiations of the local time series tweaking problem, which we refer to as reversible and irreversible time series tweaking, and propose two algorithmic solutions for the two problems along with simple optimizations. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.
  •  
24.
  • Kaveh, Amin (författare)
  • Modelling and Analysis of Probabilistic Networks
  • 2022
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • As empirical data collection and inference is often an imperfect process, and many systems can be represented as networks, it is important to develop modelling and analysis methods for imperfect network data. The main focus of this dissertation is the probabilistic network model G = (V, E, p) in which each edge is associated with an independent existence probability. This model can be used to represent both collected data and our understanding about it in many applications such as biological and social network analysis. A probabilistic network with m probabilistic edges corresponds to 2m deterministic instances, known as possible worlds, and most of the existing network analysis measures can be represented as probability distributions. This introduces three challenges. The first challenge is to find methods to calculate or estimate the required measures with non-exponential computational time complexity. The second challenge arises due to the fact that many network analysis algorithms are designed to use single number measures such as degree and cannot deal with measures that are represented as probability distributions. Therefore, the second challenge in this field is to adapt network science algorithms such that they can utilise the information represented in measures’ probability distributions. The third challenge is the aggregation of information yielded from the analysis of possible worlds. This thesis has considered these three challenges. In particular, this thesis first scrutinises fundamental local measures in probabilistic networks. It proposes measures to compare nodes’ degree distributions and it introduces a method to estimate these measures. Moreover, it extends the concepts of ego network and ego betweenness to probabilistic networks, and proposes a method to estimate ego betweenness in this context. Second, this thesis focuses on the problem of probabilistic network sparsification which is a method to generate an alternative probabilistic network whose analysis is simpler than the original one. Third, this thesis studies for the first time the problem of overlapping community detection in probabilistic networks and proposes and compares different extensions of the clique percolation method for such networks.
  •  
25.
  • Lanciano, T., et al. (författare)
  • Explainable Classification of Brain Networks via Contrast Subgraphs
  • 2020
  • Ingår i: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. - New York, NY, USA : Association for Computing Machinery. ; , s. 3308-3318
  • Konferensbidrag (refereegranskat)abstract
    • Mining human-brain networks to discover patterns that can be used to discriminate between healthy individuals and patients affected by some neurological disorder, is a fundamental task in neuro-science. Learning simple and interpretable models is as important as mere classification accuracy. In this paper we introduce a novel approach for classifying brain networks based on extracting contrast subgraphs, i.e., a set of vertices whose induced subgraphs are dense in one class of graphs and sparse in the other. We formally define the problem and present an algorithmic solution for extracting contrast subgraphs. We then apply our method to a brain-network dataset consisting of children affected by Autism Spectrum Disorder and children Typically Developed. Our analysis confirms the interestingness of the discovered patterns, which match background knowledge in the neuro-science literature. Further analysis on other classification tasks confirm the simplicity, soundness, and high explainability of our proposal, which also exhibits superior classification accuracy, to more complex state-of-the-art methods.
  •  
26.
  • Matakos, Antonis, et al. (författare)
  • Maximizing the Diversity of Exposure in a Social Network
  • 2022
  • Ingår i: IEEE Transactions on Knowledge and Data Engineering. - : Institute of Electrical and Electronics Engineers (IEEE). - 1041-4347 .- 1558-2191. ; 34:9, s. 4357-4370
  • Tidskriftsartikel (refereegranskat)abstract
    • Social-media platforms have created new ways for citizens to stay informed and participate in public debates. However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles. In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network. We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users. In the proposed setting, we take into account content and user leanings, and the probability of further sharing an article. Our model allows to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints. The resulting problem can be cast as maximizing a monotone and submodular function, subject to a matroid constraint on the allocation of articles to users. It is a challenging generalization of the influence-maximization problem. Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets. We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets.
  •  
27.
  • Matakos, A., et al. (författare)
  • Strengthening ties towards a highly-connected world
  • 2022
  • Ingår i: Data mining and knowledge discovery. - : Springer Nature. - 1384-5810 .- 1573-756X. ; 36:1, s. 448-476
  • Tidskriftsartikel (refereegranskat)abstract
    • Online social networks provide a forum where people make new connections, learn more about the world, get exposed to different points of view, and access information that were previously inaccessible. It is natural to assume that content-delivery algorithms in social networks should not only aim to maximize user engagement but also to offer opportunities for increasing connectivity and enabling social networks to achieve their full potential. Our motivation and aim is to develop methods that foster the creation of new connections, and subsequently, improve the flow of information in the network. To achieve our goal, we propose to leverage the strong triadic closure principle, and consider violations to this principle as opportunities for creating more social links. We formalize this idea as an algorithmic problem related to the densest k-subgraph problem. For this new problem, we establish hardness results and propose approximation algorithms. We identify two special cases of the problem that admit a constant-factor approximation. Finally, we experimentally evaluate our proposed algorithm on real-world social networks, and we additionally evaluate some simpler but more scalable algorithms. 
  •  
28.
  • Matakos, Antonis, et al. (författare)
  • Tell me something my friends do not know : diversity maximization in social networks
  • 2020
  • Ingår i: Knowledge and Information Systems. - : Springer. - 0219-1377 .- 0219-3116.
  • Tidskriftsartikel (refereegranskat)abstract
    • Social media have a great potential to improve information dissemination in our society, yet they have been held accountable for a number of undesirable effects, such as polarization and filter bubbles. It is thus important to understand these negative phenomena and develop methods to combat them. In this paper, we propose a novel approach to address the problem of breaking filter bubbles in social media. We do so by aiming to maximize the diversity of the information exposed to connected social-media users. We formulate the problem of maximizing the diversity of exposure as a quadratic-knapsack problem. We show that the proposed diversity-maximization problem is inapproximable, and thus, we resort to polynomial nonapproximable algorithms, inspired by solutions developed for the quadratic-knapsack problem, as well as scalable greedy heuristics. We complement our algorithms with instance-specific upper bounds, which are used to provide empirical approximation guarantees for the given problem instances. Our experimental evaluation shows that a proposed greedy algorithm followed by randomized local search is the algorithm of choice given its quality-vs.-efficiency trade-off.
  •  
29.
  • Matsuno, Ryuta, et al. (författare)
  • Improved mixing time for k-subgraph sampling*
  • 2020
  • Ingår i: Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020. - Philadelphia, PA : Society for Industrial and Applied Mathematics Publications. ; , s. 568-576
  • Konferensbidrag (refereegranskat)abstract
    • Understanding the local structure of a graph provides valuable insights about the underlying phenomena from which the graph has originated. Sampling and examining k-subgraphs is a widely used approach to understand the local structure of a graph. In this paper, we study the problem of sampling uniformly k-subgraphs from a given graph. We analyse a few different Markov chain Monte Carlo (MCMC) approaches, and obtain analytical results on their mixing times, which improve significantly the state of the art. In particular, we improve the bound on the mixing times of the standard MCMC approach, and the state-of-the-art MCMC sampling method PSRW, using the canonical-paths argument. In addition, we propose a novel sampling method, which we call recursive subgraph sampling RSS, and its optimized variant RSS+. The proposed methods, RSS and RSS+, are provably faster than the existing approaches. We conduct experiments and verify the uniformity of samples and the efficiency of RSS and RSS+.
  •  
30.
  • Merchant, A., et al. (författare)
  • Succinct Graph Representations as Distance Oracles : An Experimental Evaluation
  • 2022
  • Ingår i: Proceedings of the VLDB Endowment. - : Association for Computing Machinery (ACM). - 2150-8097. ; , s. 2297-2306
  • Konferensbidrag (refereegranskat)abstract
    • Distance oracles answer shortest-path queries between any pair of nodes in a graph. They are often built using succinct graph representations such as spanners, sketches, and compressors to minimize oracle size and query answering latency. Node embeddings, in particular, offer graph representations that place adjacent nodes nearby each other in a low-rank space. However, their use in the design of distance oracles has not been suffciently studied. In this paper, we empirically compare exact distance oracles constructed based on a variety of node embeddings and other succinct representations. We evaluate twelve such oracles along three measures of efficiency: construction time, memory requirements, and query-processing time over fourteen real datasets and four synthetic graphs. We show that distances between embedding vectors are excellent estimators of graph distances when graphs are well-structured, but less so for more unstructured graphs. Overall, our findings suggest that exact oracles based on embeddings can be constructed faster than multi-dimensional scaling (MDS) but slower than compressed adjacency indexes, require less memory than landmark oracles but more than sparsifiers or indexes, can answer queries faster than indexes but slower than MDS, and are exact more often with a smaller additive error than spanners (that have multiplicative error) while not being lossless like adjacency lists. Finally, while the exactness of such oracles is infeasible to maintain for huge graphs even under large amounts of resources, we empirically demonstrate that approximate oracles based on GOSH embeddings can efficiently scale to graphs of 100M+ nodes with only small additive errors in distance estimations. 
  •  
31.
  • Nasir, M. Anis U., et al. (författare)
  • Mining Frequent Patterns in Evolving Graphs
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous appli- cations ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem. In this paper, we initiate the study of approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent k-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of k-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and emprically demonstrate that the proposed algorithms generate high-quality results compared to baselines.
  •  
32.
  • Oettershagen, Lutz, et al. (författare)
  • Finding Densest Subgraphs with Edge-Color Constraints
  • 2024
  • Ingår i: WWW 2024 - Proceedings of the ACM Web Conference. - : Association for Computing Machinery (ACM). ; , s. 936-947
  • Konferensbidrag (refereegranskat)abstract
    • We consider a variant of the densest subgraph problem in networks with single or multiple edge attributes. For example, in a social network, the edge attributes may describe the type of relationship between users, such as friends, family, or acquaintances, or different types of communication. For conceptual simplicity, we view the attributes as edge colors. The new problem we address is to find a diverse densest subgraph that fulfills given requirements on the numbers of edges of specific colors. When searching for a dense social network community, our problem will enforce the requirement that the community is diverse according to criteria specified by the edge attributes. We show that the decision versions for finding exactly, at most, and at least h colored edges densest subgraph, where h is a vector of color requirements, are NP-complete, for already two colors. For the problem of finding a densest subgraph with at least h colored edges, we provide a linear-time constant-factor approximation algorithm when the input graph is sparse. On the way, we introduce the related at least h (non-colored) edges densest subgraph problem, show its hardness, and also provide a linear-time constant-factor approximation. In our experiments, we demonstrate the efficacy and efficiency of our new algorithms.
  •  
33.
  • Ordozgoiti, Bruno, et al. (författare)
  • Finding large balanced subgraphs in signed networks
  • 2020
  • Ingår i: The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 1378-1388
  • Konferensbidrag (refereegranskat)abstract
    • Signed networks are graphs whose edges are labelled with either a positive or a negative sign, and can be used to capture nuances in interactions that are missed by their unsigned counterparts. The concept of balance in signed graph theory determines whether a network can be partitioned into two perfectly opposing subsets, and is therefore useful for modelling phenomena such as the existence of polarized communities in social networks. While determining whether a graph is balanced is easy, finding a large balanced subgraph is hard. The few heuristics available in the literature for this purpose are either ineffective or non-scalable. In this paper we propose an efficient algorithm for finding large balanced subgraphs in signed networks. The algorithm relies on signed spectral theory and a novel bound for perturbations of the graph Laplacian. In a wide variety of experiments on real-world data we show that our algorithm can find balanced subgraphs much larger than those detected by existing methods, and in addition, it is faster. We test its scalability on graphs of up to 34 million edges.
  •  
34.
  • Ordozgoiti, Bruno, et al. (författare)
  • Generalized Leverage Scores : Geometric Interpretation and Applications
  • 2022
  • Ingår i: Proceedings 38th International Conference on Machine Learning (ICML). - : ML Research Press.
  • Konferensbidrag (refereegranskat)abstract
    • In problems involving matrix computations, the concept of leverage has found a large number of applications. In particular, leverage scores, which relate the columns of a matrix to the subspaces spanned by its leading singular vectors, are helpful in revealing column subsets to approximately factorize a matrix with quality guarantees. As such, they provide a solid foundation for a variety of machine-learning methods. In this paper we extend the definition of leverage scores to relate the columns of a matrix to arbitrary subsets of singular vectors. We establish a precise connection between column and singular-vector subsets, by relating the concepts of leverage scores and principal angles between subspaces. We employ this result to design approximation algorithms with provable guarantees for two well-known problems: generalized column subset selection and sparse canonical correlation analysis. We run numerical experiments to provide further insight on the proposed methods. The novel bounds we derive improve our understanding of fundamental concepts in matrix approximations. In addition, our insights may serve as building blocks for further contributions.
  •  
35.
  • Ordozgoiti, B., et al. (författare)
  • Provable randomized rounding for minimum-similarity diversification
  • 2022
  • Ingår i: Data mining and knowledge discovery. - : Springer Nature. - 1384-5810 .- 1573-756X. ; 36:2, s. 709-738
  • Tidskriftsartikel (refereegranskat)abstract
    • When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.
  •  
36.
  •  
37.
  • Preti, Giulia, et al. (författare)
  • Discovering Dense Correlated Subgraphs in Dynamic Networks
  • 2021
  • Ingår i: Advances In Knowledge Discovery And Data Mining, PAKDD 2021, Pt I. - Cham : Springer Nature. ; , s. 395-407
  • Konferensbidrag (refereegranskat)abstract
    • Given a dynamic network, where edges appear and disappear over time, we are interested in finding sets of edges that have similar temporal behavior and form a dense subgraph. Formally, we define the problem as the enumeration of the maximal subgraphs that satisfy specific density and similarity thresholds. To measure the similarity of the temporal behavior, we use the correlation between the binary time series that represent the activity of the edges. For the density, we study two variants based on the average degree. For these problem variants we enumerate the maximal subgraphs and compute a compact subset of subgraphs that have limited overlap. We propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework on both real and synthetic datasets. The results of the synthetic data demonstrate the high accuracy of the approximation and show the scalability of the framework.
  •  
38.
  • Rozenshtein, Polina, et al. (författare)
  • Mining Dense Subgraphs with Similar Edges
  • 2021
  • Ingår i: MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III. - Cham : Springer Nature. ; , s. 20-36
  • Konferensbidrag (refereegranskat)abstract
    • When searching for interesting structures in graphs, it is often important to take into account not only the graph connectivity, but also the metadata available, such as node and edge labels, or temporal information. In this paper we are interested in settings where such metadata is used to define a similarity between edges. We consider the problem of finding subgraphs that are dense and whose edges are similar to each other with respect to a given similarity function. Depending on the application, this function can be, for example, the Jaccard similarity between the edge label sets, or the temporal correlation of the edge occurrences in a temporal graph. We formulate a Lagrangian relaxation-based optimization problem to search for dense subgraphs with high pairwise edge similarity. We design a novel algorithm to solve the problem through parametric MIN-CUT [15,17], and provide an efficient search scheme to iterate through the values of the Lagrangian multipliers. Our study is complemented by an evaluation on real-world datasets, which demonstrates the usefulness and efficiency of the proposed approach.
  •  
39.
  • Rozenshtein, P., et al. (författare)
  • The network-untangling problem : from interactions to activity timelines
  • 2021
  • Ingår i: Data mining and knowledge discovery. - : Springer. - 1384-5810 .- 1573-756X. ; 35:1, s. 213-247
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper we study a problem of determining when entities are active based on their interactions with each other. We consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge (u, v, t) ∈ E denotes an interaction between entities u and v at time t. We assume an activity model where each entity is active during at most k time intervals. An interaction (u, v, t) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals for all entities in the network, so as to explain the observed interactions. This problem, the network-untangling problem, can be applied to discover event timelines from complex entity interactions. We provide two formulations of the network-untangling problem: (i) minimizing the total interval length over all entities (sum version), and (ii) minimizing the maximum interval length (max version). We study separately the two problems for k= 1 and k> 1 activity intervals per entity. For the case k= 1 , we show that the sum problem is NP-hard, while the max problem can be solved optimally in linear time. For the sum problem we provide efficient algorithms motivated by realistic assumptions. For the case of k> 1 , we show that both formulations are inapproximable. However, we propose efficient algorithms based on alternative optimization. We complement our study with an evaluation on synthetic and real-world datasets, which demonstrates the validity of our concepts and the good performance of our algorithms.
  •  
40.
  • Spoerhase, Joachim, et al. (författare)
  • A Constant-Factor Approximation Algorithm for Reconciliation k-Median
  • 2023
  • Ingår i: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023. - : ML Research Press. ; , s. 1719-1746
  • Konferensbidrag (refereegranskat)abstract
    • In the reconciliation k-median problem we ask to cluster a set of data points by picking k cluster centers so as to minimize the sum of distances of the data points to their cluster centers plus the sum of pairwise distances between the centers. The problem, which is a variant of classic k-median, aims to find a set of cluster centers that are not too far from each other, and it has applications, for example, when selecting a committee to deliberate on a controversial topic. This problem was introduced recently (Ordozgoiti and Gionis, 2019), and it was shown that a local-search-based algorithm is always within a factor O(k) of an optimum solution and performs well in practice. In this paper, we demonstrate a close connection of reconciliation k-median to a variant of the k-facility location problem, in which each potential cluster center has an individual opening cost and we aim at minimizing the sum of client-center distances and the opening costs. This connection enables us to provide a new algorithm for reconciliation k-median that yields a constant-factor approximation (independent of k). We also provide a sparsification scheme that reduces the number of potential cluster centers to O(k) in order to substantially speed up approximation algorithms. We empirically compare our new algorithms with the previous local-search approach, showing improved performance and stability. In addition, we show how our sparsification approach helps to reduce computation time without significantly compromising the solution quality.
  •  
41.
  • Thejaswi, Suhas, et al. (författare)
  • Diversity-Aware k-median : Clustering with Fair Center Representation
  • 2021
  • Ingår i: MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021. - Cham : Springer Nature. ; , s. 765-780
  • Konferensbidrag (refereegranskat)abstract
    • We introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into k clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements. More precisely, we are given a set of clients C, a set of facilities F, a collection F = {F-1,...,Ft} of facility groups F-i subset of F, a budget k, and a set of lower-bound thresholds R = {r(1),..,r(t)}, one for each group in The diversity-aware k-median problem asks to find a set S of k facilities in F such that vertical bar S boolean AND F-i vertical bar >= r(i), that is, at least ri centers in S are from group and the k-median cost Sigma(c is an element of C) min(s is an element of S) d(c, s) is minimized. We show that in the general case where the facility groups may overlap, the diversity-aware k-median problem is NP-hard, fixed-parameter intractable with respect to parameter k, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and redblue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets.
  •  
42.
  • Thejaswi, Suhas, et al. (författare)
  • Finding Path Motifs in Large Temporal Graphs Using Algebraic Fingerprints
  • 2020
  • Ingår i: BIG DATA. - : Mary Ann Liebert Inc. - 2167-6461 .- 2167-647X. ; 8:5, s. 335-362
  • Tidskriftsartikel (refereegranskat)abstract
    • We study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several applications, for example, in recommending tours for tourists or detecting abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we consider, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution scales to massive graphs with up to a billion edges for a multiset query with 5 colors and up to 100 million edges for a multiset query with 10 colors, despite the problems being non-deterministic polynomial time-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and is highly optimized. For instance, in a real-world graph dataset with >6 million edges and a multiset query with 10 colors, we can extract an optimal solution in <8 minutes on a Haswell desktop with four cores.
  •  
43.
  • Thejaswi, Suhas, et al. (författare)
  • Pattern detection in large temporal graphs using algebraic fingerprints
  • 2020
  • Ingår i: Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020. - Philadelphia, PA : Society for Industrial & Applied Mathematics (SIAM). ; , s. 37-45
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, we study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multi-set of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several interesting applications, for example, recommending tours for tourists, or searching for abnormal behavior in a network of financial transactions. For the family of pattern-detection problems we define, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution can scale to massive graphs with up to hundred million edges, despite the problems being NP-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and highly optimized. For example, in a real-world graph dataset with more than six million edges and a multi-set query with ten colors, we can extract an optimal solution in less than eight minutes on a haswell desktop with four cores.
  •  
44.
  • Tommasini, R., et al. (författare)
  • Accepted Tutorials at The Web Conference 2022
  • 2022
  • Ingår i: WWW 2022 - Companion Proceedings of the Web Conference 2022. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 391-399
  • Konferensbidrag (refereegranskat)abstract
    • This paper summarizes the content of the 20 tutorials that have been given at The Web Conference 2022: 85% of these tutorials are lecture style, and 15% of these are hands on. 
  •  
45.
  • Tu, Sijing, et al. (författare)
  • Adversaries with Limited Information in the Friedkin-Johnsen Model
  • 2023
  • Ingår i: KDD 2023. - : Association for Computing Machinery (ACM). ; , s. 2201-2210
  • Konferensbidrag (refereegranskat)abstract
    • In recent years, online social networks have been the target of adversaries who seek to introduce discord into societies, to undermine democracies and to destabilize communities. Often the goal is not to favor a certain side of a conflict but to increase disagreement and polarization. To get a mathematical understanding of such attacks, researchers use opinion-formation models from sociology, such as the Friedkin - Johnsen model, and formally study how much discord the adversary can produce when altering the opinions for only a small set of users. In this line of work, it is commonly assumed that the adversary has full knowledge about the network topology and the opinions of all users. However, the latter assumption is often unrealistic in practice, where user opinions are not available or simply difficult to estimate accurately. To address this concern, we raise the following question: Can an attacker sow discord in a social network, even when only the network topology is known? We answer this question affirmatively. We present approximation algorithms for detecting a small set of users who are highly influential for the disagreement and polarization in the network. We show that when the adversary radicalizes these users and if the initial disagreement/polarization in the network is not very high, then our method gives a constant-factor approximation on the setting when the user opinions are known. To find the set of influential users, we provide a novel approximation algorithm for a variant of MaxCut in graphs with positive and negative edge weights. We experimentally evaluate our methods, which have access only to the network topology, and we find that they have similar performance as methods that have access to the network topology and all user opinions. We further present an NP-hardness proof, which was left as an open question by Chen and Racz [IEEE Transactions on Network Science and Engineering, 2021].
  •  
46.
  • Tu, Sijing, et al. (författare)
  • Co-exposure maximization in online social networks
  • 2020
  • Ingår i: Advances in Neural Information Processing Systems. - : Neural information processing systems foundation.
  • Konferensbidrag (refereegranskat)abstract
    • Social media has created new ways for citizens to stay informed on societal matters and participate in political discourse. However, with its algorithmically-curated and virally-propagating content, social media has contributed further to the polarization of opinions by reinforcing users’ existing viewpoints. An emerging line of research seeks to understand how content-recommendation algorithms can be re-designed to mitigate societal polarization amplified by social-media interactions. In this paper, we study the problem of allocating seed users to opposing campaigns: by drawing on the equal-time rule of political campaigning on traditional media, our goal is to allocate seed users to campaigners with the aim to maximize the expected number of users who are co-exposed to both campaigns. We show that the problem of maximizing co-exposure is NP-hard and its objective function is neither submodular nor supermodular. However, by exploiting a connection to a submodular function that acts as a lower bound to the objective, we are able to devise a greedy algorithm with provable approximation guarantee. We further provide a scalable instantiation of our approximation algorithm by introducing a novel extension to the notion of random reverse-reachable sets for efficiently estimating the expected co-exposure. We experimentally demonstrate the quality of our proposal on real-world social networks.
  •  
47.
  • Tzeng, Ruo-Chun, 1993-, et al. (författare)
  • Discovering conflicting groups in signed networks
  • 2020
  • Ingår i: Advances in Neural Information Processing Systems 33 (NeurIPS 2020). - : Neural information processing systems foundation.
  • Konferensbidrag (refereegranskat)abstract
    • Signed networks are graphs where edges are annotated with a positive or negative sign, indicating whether an edge interaction is friendly or antagonistic. Signed networks can be used to study a variety of social phenomena, such as mining polarized discussions in social media, or modeling relations of trust and distrust in online review platforms. In this paper we study the problem of detecting k conflicting groups in a signed network. Our premise is that each group is positively connected internally and negatively connected with the other k − 1 groups. A distinguishing aspect of our formulation is that we are not searching for a complete partition of the signed network; instead, we allow a subset of nodes to be neutral with respect to the conflict structure we are searching. As a result, the problem we tackle differs from previously-studied problems, such as correlation clustering and k-way partitioning. To solve the conflicting-group discovery problem, we derive a novel formulation in which each conflicting group is naturally characterized by the solution to the maximum discrete Rayleigh’s quotient (MAX-DRQ) problem. We present two spectral methods for finding approximate solutions to the MAX-DRQ problem, which we analyze theoretically. Our experimental evaluation shows that, compared to state-of-the-art baselines, our methods find solutions of higher quality, are faster, and recover ground-truth conflicting groups with higher accuracy
  •  
48.
  • Tzeng, Ruo-Chun, 1993-, et al. (författare)
  • Improved analysis of randomized SVD for top-eigenvector approximation
  • 2022
  • Ingår i: International Conference On Artificial Intelligence And Statistics, Vol 151. - : ML Research Press.
  • Konferensbidrag (refereegranskat)abstract
    • Computing the top eigenvectors of a matrix is a problem of fundamental interest to various fields. While the majority of the literature has focused on analyzing the reconstruction error of low-rank matrices associated with the retrieved eigenvectors, in many applications one is interested in finding one vector with high Rayleigh quotient. In this paper we study the problem of approximating the top-eigenvector. Given a symmetric matrix A with largest eigenvalue lambda(1), our goal is to find a vector (u) over cap that approximates the leading eigenvector u(1) with high accuracy, as measured by the ratio R((u) over cap) = lambda(-1)(1) (u) over cap (T)A (u) over cap/(u) over cap (T)(u) over cap. We present a novel analysis of the randomized SVD algorithm of Halko et al. (2011b) and derive tight bounds in many cases of interest. Notably, this is the first work that provides non-trivial bounds for approximating the ratio R((u) over cap) using randomized SVD with any number of iterations. Our theoretical analysis is complemented with a thorough experimental study that confirms the efficiency and accuracy of the method.
  •  
49.
  • Xiao, Han, et al. (författare)
  • Searching for polarization in signed graphs : a local spectral approach
  • 2020
  • Ingår i: The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 362-372
  • Konferensbidrag (refereegranskat)abstract
    • Signed graphs have been used to model interactions in social networks, which can be either positive (friendly) or negative (antagonistic). The model has been used to study polarization and other related phenomena in social networks, which can be harmful to the process of democratic deliberation in our society. An interesting and challenging task in this application domain is to detect polarized communities in signed graphs. A number of different methods have been proposed for this task. However, existing approaches aim at finding globally optimal solutions. Instead, in this paper we are interested in finding polarized communities that are related to a small set of seed nodes provided as input. Seed nodes may consist of two sets, which constitute the two sides of a polarized structure. In this paper we formulate the problem of finding local polarized communities in signed graphs as a locally-biased eigen-problem. By viewing the eigenvector associated with the smallest eigenvalue of the Laplacian matrix as the solution of a constrained optimization problem, we are able to incorporate the local information as an additional constraint. In addition, we show that the locally-biased vector can be used to find communities with approximation guarantee with respect to a local analogue of the Cheeger constant on signed graphs. By exploiting the sparsity in the input graph, an indicator-vector for the polarized communities can be found in time linear to the graph size. Our experiments on real-world networks validate the proposed algorithm and demonstrate its usefulness in finding local structures in this semi-supervised manner.
  •  
50.
  • Zhang, Guangyi, et al. (författare)
  • Coresets remembered and items forgotten : submodular maximization with deletions
  • Annan publikation (övrigt vetenskapligt/konstnärligt)abstract
    • In recent years we have witnessed an increase on the development of methods for submodular optimization, which have been motivated by the wide applicability of submodular functions in real-world data-science problems. In this paper, we contribute to this line of work by considering the problem of robust submodular maximization against unexpected deletions, which may occur due to privacy issues or user preferences. Specifically, we consider the minimum number of items an algorithm has to remember, in order to achieve a non-trivial approximation guarantee against adversarial deletion of up to d items. We refer to the set of items that an algorithm has to keep before adversarial deletions as a deletion-robust coreset.Our theoretical contributions are two-fold. First, we propose a single-pass streaming algorithm that yields a (1−2ϵ)/(4p)-approximation for maximizing a non-decreasing submodular function under a general p-matroid constraint and requires a coreset of size k+d/ϵ, where k is the maximum size of a feasible solution. To the best of our knowledge, this is the first work to achieve an (asymptotically) optimal coreset, as no constant-factor approximation is possible with a coreset of size sublinear in d. Second, we devise an effective offline algorithm that guarantees stronger approximation ratios with a coreset of size O(dlog(k)/ϵ). We also demonstrate the superior empirical performance of the proposed algorithms in real-life applications.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 60

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy