Where's Zoé? - How to support accents and diacritics in Fabric Data Agents
Sometimes, the AI isnât wrong, itâs just too literal. You search for zoe and your table contains a ZoĂ©. Your Data Agent confidently replies No results found. ZoĂ© didnât disappear, sheâs just hiding behind an accent!
Letâs unpack why that happens - and how to fix it correctly in Microsoft Fabric.
A Quick Note on Diacritics (and Why They Matter)
Letâs take a simple employee table - with id, department, first and last name. The firm is a French one, so there are quite few accents in names.
id,prenom,nom,departement
1,Jean,Dupont,IT
2,Ălodie,Martin,HR
3,François,Lambert,Finance
4,Marie,Curie,IT
5,André,Bernard,Marketing
6,Cécile,Durand,Finance
7,Pierre,Moreau,IT
8,NaĂŻma,Legrand,HR
9,Luc,Petit,Marketing
10,Ăve,Robert,IT
11,Paul,Dubois,Finance
12,Zoé,Merci,HR
13,Antoine,Girard,Marketing
14,Ăa va,Test,LĂ©gal
15,LoĂŻc,Deschamps,IT
16,InĂšs,Fournier,Finance
17,Thomas,Rousseau,HR
18,Audrey,Tremblay,Marketing
19,Ătienne,Perrot,IT
20,Camille,LefĂšvre,Finance
So, with this data loaded in a Lakehouse table, and after configuring our Data agent, letâs try to find ZoĂ©.

If you read this article title, you are probably expecting that result đ. Before jumping into SQL, letâs clarify what weâre dealing with. A diacritic is a mark added to a letter that changes pronunciation or meaning. Like
-
In Latin-based languages-:
-
Ă©, Ăš, ĂȘ (French)
-
ñ (Spanish)
-
ĂŒ (German)
-
ç (French, Portuguese)
In many countries, removing the accent does not change identity in casual usage. âZoĂ©â and âZoeâ refer to the same person. But technically, they are different Unicode characters.
Now, letâs look at the SQL query generated by Data Agent.

The LLM powering data agent has effectively transcribed our natural language query into SQL dialect. So, we end up
with the WHERE prenom = 'zoe' predicate. And the default rules for sorting and comparing character data in the SQL Endpoint for Lakehouse - what we call a *collation*https://learn.microsoft.com/en-us/fabric/data-warehouse/collation-
are accent sensitive (And to be precise, Latin1_General_100_BIN2_UTF8).
So, zoe is different from Zoé.
How to make the Data Agent smarter about accents?
At the time of writing, there is no way to change the collation of an SQL Endpoint to one that uses accent insensitive comparison. You must do this at query time:
SELECT id, prenom, nom, departement
FROM employees
WHERE prenom COLLATE Latin1\_General\_CI\_AI LIKE '%zoe%'
This query will return Zoéeven if the search term is zoe. You can instruct the data agent to use this construction through data source-specific instructions.

Once the custom instructions are added, letâs ask our question again.

We found Zoé!
A word of caution about performance
I havenât looked at the internal implementation details, but what we know for sure is that delta file format does not understand SQL-specific collations. That means the physical storage of data is not organized to query effectively data with the right way to compare letters. The SQL engine will probably have to scan the entire dataset to make this filtering.
If itâs a one-off query and you have thousand of records, youâre probably good with the solution above. If youâre running this query often and/or have millions of records, you are better off computing a new column with normalized value.






