Notes

Feb 17, 2026
Where's Zoé? - How to support accents and diacritics in Fabric Data Agents

Sometimes, the AI isn’t wrong, it’s just too literal. You search for zoe and your table contains a Zoé. Your Data Agent confidently replies No results found. Zoé didn’t disappear, she’s just hiding behind an accent!

Let’s unpack why that happens - and how to fix it correctly in Microsoft Fabric.

A Quick Note on Diacritics (and Why They Matter)

Let’s take a simple employee table - with id, department, first and last name. The firm is a French one, so there are quite few accents in names.
```
id,prenom,nom,departement
1,Jean,Dupont,IT
2,Élodie,Martin,HR
3,François,Lambert,Finance
4,Marie,Curie,IT
5,André,Bernard,Marketing
6,Cécile,Durand,Finance
7,Pierre,Moreau,IT
8,Naïma,Legrand,HR
9,Luc,Petit,Marketing
10,Ève,Robert,IT
11,Paul,Dubois,Finance
12,Zoé,Merci,HR
13,Antoine,Girard,Marketing
14,Ça va,Test,Légal
15,Loïc,Deschamps,IT
16,Inès,Fournier,Finance
17,Thomas,Rousseau,HR
18,Audrey,Tremblay,Marketing
19,Étienne,Perrot,IT
20,Camille,Lefèvre,Finance
```
So, with this data loaded in a Lakehouse table, and after configuring our Data agent, let’s try to find Zoé.

If you read this article title, you are probably expecting that result 🙂. Before jumping into SQL, let’s clarify what we’re dealing with. A diacritic is a mark added to a letter that changes pronunciation or meaning. Like
- In Latin-based languages-:
- é, è, ê (French)
- ñ (Spanish)
- ü (German)
- ç (French, Portuguese)
In many countries, removing the accent does not change identity in casual usage. “Zoé” and “Zoe” refer to the same person. But technically, they are different Unicode characters.

Now, let’s look at the SQL query generated by Data Agent.

The LLM powering data agent has effectively transcribed our natural language query into SQL dialect. So, we end up with the WHERE prenom = 'zoe' predicate. And the default rules for sorting and comparing character data in the SQL Endpoint for Lakehouse - what we call a *collation*https://learn.microsoft.com/en-us/fabric/data-warehouse/collation- are accent sensitive (And to be precise, Latin1_General_100_BIN2_UTF8). So, zoe is different from Zoé.

How to make the Data Agent smarter about accents?

At the time of writing, there is no way to change the collation of an SQL Endpoint to one that uses accent insensitive comparison. You must do this at query time:
```
SELECT id, prenom, nom, departement
FROM employees
WHERE prenom COLLATE Latin1\_General\_CI\_AI LIKE '%zoe%'
```
This query will return Zoéeven if the search term is zoe. You can instruct the data agent to use this construction through data source-specific instructions.

Once the custom instructions are added, let’s ask our question again.

We found Zoé!

A word of caution about performance

I haven’t looked at the internal implementation details, but what we know for sure is that delta file format does not understand SQL-specific collations. That means the physical storage of data is not organized to query effectively data with the right way to compare letters. The SQL engine will probably have to scan the entire dataset to make this filtering.

If it’s a one-off query and you have thousand of records, you’re probably good with the solution above. If you’re running this query often and/or have millions of records, you are better off computing a new column with normalized value.
Feb 6, 2026
2026-16 - Interesting reads

Building a C compiler without writing a line of code

Somone at anthropic build a team of agents, and for 20k $, build a C compiler that can compile Linux (among other OSS projects like sqlite), and compile+run doom on it. My key takeaways:
- This pushes the future definition of SWE. Building a compiler is not a small feat, and while we may have issues getting GHCP write proper TypeScript code, they managed to build it, for x86, Arm and even RISC V.
- It was made possible only due to some fundamentals, including tooling and testing. Because GCC has a large test suite they could leverage, agents were able to steer themselves. The comparaison with “GCC needed X people during YY years” is then false: all these years of work were “sumamrized” in those “torture test”. They’re the best spec possible, written and refined by humans.
- They are sharing the source code…of the compiler, but not the code for agents, or the tooling they’ve used (they were running agents in //, each in it’s own docker container.
- They are some recipes - lock files for // agents, persistent tasks lists, ideas files, working on our code/tests output to limit context window pollution, … - applicable outside of this experiment.
Why LLM hallucinate? Because of us!

TL;DR: A good consultant will try to answer any question. A great consultant is not afraid to say “I don’t know”. All LLM benchmarks push LLMs to be “good consultants”. (both in training and post training).
- There are “classes” of hallucinations. One of them being sparse facts (birthdays, …) that appear only once in training data.
- They recommend to add explicit confidence threshold to all major benchmarks to reward uncertainty in LLM output
Feb 22, 2025
Practical tips to manage your inbox

About 10 years ago, with the rise of collaboration platforms like Slack and later Microsoft Teams, many predicted the death of email. Yet today, not only is email still alive, but it has arguably become even more essential.

Most modern apps now include notification features—GitHub, Jira, banking apps, even your car—but instead of reducing email volume, they’ve often contributed to its growth. Think about it: how many emails do you receive today from an app that was designed to reduce email? Probably a nonzero number.

Managing your email is a critical skill in tech and beyond. As a developer and data engineer, I receive a massive volume of emails weekly—project updates, system alerts, team discussions, and external communications. Without a system, it would be overwhelming.

Here’s how I handle my emails efficiently without stressing about ‘Inbox Zero.’

Inbox Zero is Overrated – Try “Inbox No-Scroll”

Aiming for Inbox Zero can be exhausting and counterproductive. Instead, I follow an “Inbox No-Scroll” approach:
- At any given time, my inbox fits within a single screen.
- If I have to scroll, it means I need to process more emails.
This keeps things manageable without the unrealistic pressure of a perfectly empty inbox. The goal is to ensure the inbox remains a workspace, not an archive.

Three Inboxes Are Better Than One

A single inbox quickly becomes a chaotic mix of everything. I separate my emails into three main inboxes:
1. External Inbox: All emails from outside my company. These may require quicker responses or different handling than internal communications.
2. CC Inbox: Emails where I’m copied, and not in the to field.
3. Primary Inbox: Everything else!
This structure allows me to focus on what’s important without getting distracted by notifications or low-priority messages.

A routine to Handle Emails Efficiently

When I start working on my email, I use this priority order:
1. External emails first. My job is a public-facing one, so it’s important to be reactive and engaged while discussing with our community members.
2. Then Manager, upper management and my team.
3. Then all other messages in the primary inbox.
4. Then emails in CC. They are mostly informational, and can wait a day or two.
The CC inbox is by far the most effective change I’ve done to my inbox. If you want to try only one thing, choose this!

Stand Out with Colors

Color coding is a game-changer. I assign different colors to emails based on categories:
- Managing up: Emails from my manager and above are in Red.
- My direct Team (Teal).
- Sometimes, some vTeam members when we’re working on a time-limited initiative/spike. Here I have an orange one.
Here is a screenshot of my principal inbox as of now.

Use the Snooze Feature

Not every email requires immediate action, but some shouldn’t be lost either. Outlook’s Snooze feature helps by temporarily hiding emails and resurfacing them at the right time—whether it’s a follow-up for a project or a reminder about an upcoming deadline.

What About Notifications, newsletters and Automated Emails?

This is an area where I’m not satisfied. I have some ideas for them, but I haven’t tested them yet.

Have an Email Budget

Just like managing time or money, managing email effectively requires setting limits. I allocate specific time blocks in my schedule to check and respond to emails:
- Morning: Process priority emails and clear the inbox. Most people argue against checking your email first thing on in the morning. But I work on an international team, and a lot happens in US and APAC while I’m sleeping. I have a very short window to answer to my APAC colleagues so they get a same-day response. Not doing this and it takes a week to ask one question and a follow-up one!
- Midday: Quick check-in for anything urgent.
- End of Day: Final review and snooze or archive emails for later.
This prevents email from becoming a constant distraction and ensures focused work time.

Email isn’t going away, but how we manage it can evolve. By structuring my inbox, using tools like color coding and snooze, and setting clear time limits, I keep my emails under control without the stress of Inbox Zero.

Adopting an efficient email workflow can free up mental space for deeper work, making you more productive without feeling overwhelmed by constant notifications.
Feb 13, 2025
Best practices for Azure SQL Serverless and Fabric Databases

For several years now, we’ve had access to the serverless version of Azure SQL, a PaaS database that automatically stops when not in use and wakes up as soon as a request is made. The advantage? You only pay for storage when the database is off. This is particularly useful for databases that aren’t used frequently, like an app to manage time off or shared development databases.

Despite its benefits, serverless databases are not widely adopted. With the introduction ofSQL Database in Microsoft Fabric, a serverless SaaS database, it’s an opportune time to revisit this topic and share some best practices.

Moving from Vertical Sizing to Horizontal Scaling

In traditional cloud databases, costs are primarily driven by the size of the server, measured in vCores/memory or DTUs. You need to provision the right resources to handle all requests, even if some don’t need full capacity or the database isn’t used during off-hours. With serverless databases, the vCore count is flexible—you set a maximum but may be billed for less. The main cost driver is now the duration your database is running.

While this may seem like a minor change, it has significant implications for implementation and production. For instance, leveraging SQL query cache becomes less relevant, while monitoring request numbers over time becomes more important.

Serverless-specific best practices are tied to one thing: auto-pause

The biggest advantage of serverless databases is their ability to pause. With a minimum auto-pause delay of 15 minutes in Azure SQL and Fabric, it’s crucial to determine if this feature suits your use case. Ideally, serverless databases are most beneficial for short or medium periods of use with long pauses. Take a few minutes to calculate and compare costs to ensure serverless is more economical than a PaaS server.

Few best practices to consider

Cache your data outside the DB

Most databasse are read-heavy. That means that for one INSERT or UPDATE statement, you may have hundreds of SELECT statements hitting your DB. By caching read-heavy data outside the database, you can reduce the number of times the database needs to wake up, saving costs and improving performance. You could look at this article https://learn.microsoft.com/en-us/azure/architecture/patterns/cache-asideto learn more about this pattern.

Properly configure your ORM

Timeout and connection pooling are crucial settings in your Object-Relational Mapping (ORM) tool. Some default settings may be too aggressive if you need to wait for database wakeup/autostart. Here’s an example of how to configure Entity Framework Core for optimal performance with a serverless database:
```
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseSqlServer(
        connectionString,
        sqlServerOptionsAction: sqlOptions =>
        {
            sqlOptions.EnableRetryOnFailure(
                maxRetryCount: 5,
                maxRetryDelay: TimeSpan.FromSeconds(30),
                errorNumbersToAdd: null);
            sqlOptions.CommandTimeout(TimeSpan.FromSeconds(60));
        });
}
```
Measure Database wake up

Monitoring time it takes for your database to wake up from a paused state can help you optimize performance. Use tools like Azure Monitor to track wake-up times and adjust your auto-pause settings accordingly. It’ll also ensure that “nobody” is waking up your database. Let’s take a simple example: You containerize your app. In development, everything is fine. But in production, you database is always up. The culpirt: Docker healthchecks and Kubernetes liveness probes. You’ve done your job by implementing a thorough probe that even checks if your database is healthy, and it comes and bite yor.

Be transparent with your users

You have made an economical-conscious choice, that happens to be a sustainable choice as well. And sometimes only, some of your users may have to wait for a few seconds? Let them know!

Inform your users about potential delays due to database wake-up times. Transparency can help manage expectations and improve user satisfaction.

Depending on your Framework, this could be quite easy to implement. Here is a very naive implementation in React. In an ideal situation, your checks may be a bit more advanced. You may want to do this only for the first-time request (and save it in a cookie?) or have a “server ping” method: If you can reach the server, that certainly means that you’re in wakeup phase and you’re safe to display the message.

Delay writes

It might not be applicable to all applications, but you could delay your writes if there is no active connection. You could send the message in a Service Bus that will get processed by a worker. Yes, that’s not an experience you may want, but if you’re building a Survey app, you could wait something like one hour to get your results.

There is a lifetime Azure SQL Serverless free tier! With 100,000vCore-seconds per month and auto-pause at 15 minutes, you could wakeup your database and execute queries for 5 minutes every 20 minutes (consuming 600 vCore-seconds at 0.5 vCore each time), you could wakeup your DB 166 times a month / Every 4.5 hours. And you can schedule the execution of an Azure Container App Job every x hours quite easily.

One more best practice for SQL Database in Fabric

One more notable feature of SQL Database in Fabric is its real-time analytics store replication, which can be queried from an SQL Endpoint. If you accept a small latency of less than a minute, you can use the SQL Analytics endpoint for read operations and the SQL Database endpoint for write operations. This setup allows for efficient separation of read and write workloads, enhancing performance and scalability. Please note that, in the case of Fabric, you’re still paying for the number of CUs attached to the capacity, whenether you’re using SQL Database or the SQL Endpoint. The main interest here is to spare some CUs to do something else on your stack.
Feb 4, 2025
Rebuilding My Blog - Don’t Wait Until It’s Finished to Make It Happen

My blog has been under construction for a few years now, but instead of waiting for perfection, I’ve decided to rebuild it from scratch. Alongside this, I’m launching a new Markdown notes website—an “anti-chamber” where I can quickly share drafts and thoughts as they come.

## An Email-Based Blog: Why Outlook is the Best Tool

One of the coolest aspects of this new approach is using my email client—Outlook—as a blogging tool. Here’s why:
- Drafts Galore: I can have as many drafts as I want, all neatly organized.
- Sync Across Devices: My content automatically syncs across all devices, so I can start writing from my phone, whether I’m waiting somewhere or on the go.
- Rich Editing Features: Outlook offers everything needed for writing, from spelling and grammar checks to Copilot for writing assistance.
- Quick Image Addition: I can easily add images, even drawing directly on the Outlook mobile app, which saves as images.
- Easy Scheduling: With Outlook’s ability to schedule emails, I can plan my posts to go live at any chosen time. This means I can prepare a post and have it published with just a minute’s notice.
## How to Send an Email and Have It Posted Online

Inspired by the HEY World https://www.hey.com/world/service, my new blog architecture is both innovative and straightforward. By leveraging a combination of email servers, Azure Functions, and GitHub Actions, I’ve created a seamless process to turn an email into a blog post.

For now, the code is private, but I’m happy to opensource it if anybody finds it useful.

Notes

A Quick Note on Diacritics (and Why They Matter)

How to make the Data Agent smarter about accents?

A word of caution about performance

Inbox Zero is Overrated – Try “Inbox No-Scroll”

Three Inboxes Are Better Than One

A routine to Handle Emails Efficiently

Stand Out with Colors

Use the Snooze Feature

What About Notifications, newsletters and Automated Emails?

Have an Email Budget

Moving from Vertical Sizing to Horizontal Scaling

Serverless-specific best practices are tied to one thing: auto-pause

Few best practices to consider

Cache your data outside the DB

Properly configure your ORM

Measure Database wake up

Be transparent with your users

Delay writes

One more best practice for SQL Database in Fabric