Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

findere: Fast and Precise Approximate Membership Query

Published in Spire 2021, 2021

Approximate membership query (AMQ) structures such as Cuckoo filters or Bloom filters are widely used for representing large sets of elements. Their lightweight space usage explains their success, mainly as they are the only way to scale hundreds of billions or trillions of elements. However, they suffer by nature from non-avoidable false-positive calls that bias downstream analyses of methods using these data structures.

Download here

fimpera: drastic improvement of Approximate Membership Query data-structures with counts

Published in Bioinformatics, 2023

High throughput sequencing technologies generate massive amounts of biological sequence datasets as costs fall. One of the current algorithmic challenges for exploiting these data on a global scale consists in providing efficient query engines on these petabyte-scale datasets. Most methods indexing those datasets rely on indexing words of fixed length k, called k-mers. Many applications, such as metagenomics, require the abundance of indexed k-mers as well as their simple presence or absence, but no method scales up to petabyte-scaled datasets. This deficiency is primarily because storing abundance requires explicit storage of the k-mers in order to associate them with their counts. Using counting Approximate Membership Queries (cAMQ) data structures, such as counting Bloom filters, provides a way to index large amounts of k-mers with their abundance, but at the expense of a sensible false positive rate. Results

Download here

talks

Talk at DSB 2021

Published:

This talk was about the proof of concept of a “query time filtration” algorithm, which, applied on an AMQ, reduces its false positive rate.

Talk at SPIRE 2021

Published:

This talk was about findere, our published paper at the SPIRE conference.

Talk at DSB 2022

Published:

This talk was about fimpera, an algorithm wrapping datastructures recording the abundance of kmers in biological sequences. fimpera reduces the overestimations cause by the underlying datastructure it wraps.

Talk at JOBIM 2022

Published:

This talk was about findere, our first published paper.

teaching

introduction to C++

course, ENSAI, 2023

I taught an introduction to LaTeX during a semester of practical work sessions.