Sebastian Wild

(AG Maschinelles Lernen, RPTU)
hosted by Department of Computer Science

"Powersort & Computing over compressed graph-structured data"

In the first part of the talk, I will tell the story of the adoption of our sorting algorithm Powersort as the default sorting method of Python.
Tim Peters invented Timsort, a clever Mergesort variant, for the CPython reference implementation of the Python programming language in 2001. Timsort is both effective in Python and a popular export product: it is used in many languages and frameworks, notably OpenJDK, the Android runtime, and the V8 JavaScript engine. Despite this success, algorithms researchers eventually exposed two flaws in Timsort's underlying merge policy. Based on ideas from optimal alphabetic trees, our Powersort merge policy finds nearly optimal merging orders with negligible overhead, and is now part of the CPython implementation (since Python 3.11).
The second part of the talk introduces the vision of computing over compressed data with a focus on recent work on graph-structured data.
The tremendous success of biological sequence analysis in applications is in no small part driven by algorithmic innovations in compressed text indexing from the last 25 years. These allow us to efficiently store, e.g., a human genome in less space than its plain-text representation while simultaneously supporting fast fuzzy string searches. For datasets with an underlying graph structure, such as knowledge graphs, social networks, or a brain's connectome, analogous techniques are still in their infancy. I will present our initial results for trees and hint at upcoming developments in this area.


Time: Tuesday, 23.01.2024, 13:00
Place:

Termin als iCAL Datei downloaden und in den Kalender importieren.