About this article

The macOS file system is case insensitive in most directories by default. In such a world, what would match with wildcards?

Being case insensitive

For example, if you execute this script

`bash`



#!/bin/bash

set -eu

#ASCII alphanumeric
touch foo.txt FOO.txt
ls *.txt && rm *.txt

#So-called double-byte characters
touch ｚｅｎ.txt ＺＥＮ.txt
ls *.txt && rm *.txt

#Greek letters, Cyrillic letters, Roman numerals
touch ωяⅶ.txt ΩЯⅦ.txt
ls *.txt && rm *.txt

# DZ, NJ
touch ǳ.txt # U+01F3	Latin Small Letter DZ
touch ǲ.txt # U+01F2	Latin Capital Letter D with Small Letter z
touch Ǳ.txt # U+01F1	Latin Capital Letter DZ
touch ǌ.txt # U+01CC	Latin Small Letter NJ
touch ǋ.txt # U+01CB	Latin Capital Letter N with Small Letter J
touch Ǌ.txt # U+01CA	Latin Capital Letter NJ
ls *.txt && rm *.txt

# i witout dot, etc.
touch ı.txt # U+0131 Small I without dot
touch İ.txt # U+0130 Capital I with dot
touch i.txt # U+0069 Small I
touch I.txt # U+0049 Capital I
ls *.txt && rm *.txt

#Enclosing character
touch ⓐ.txt # U+24D0
touch Ⓐ.txt # U+24B6
ls *.txt && rm *.txt

This is the result

foo.txt
ｚｅｎ.txt
ωяⅶ.txt
ǌ.txt	ǳ.txt
i.txt	İ.txt	ı.txt
ⓐ.txt

FOO and foo are easy. Since it is not case sensitive, only one can survive.

As you can see around zen, ωяⅶ and ⓐ, characters outside the 7bit-ASCII range also have case, and they are not distinguished.

ǲ is a letter in the category" Titlecase Letter "and is neither uppercase nor lowercase. The corresponding lowercase letter is ǳ and the corresponding uppercase letter is Ǳ. This character is also case insensitive, so even if you write touch ǳ.txt ǲ.txt Ǳ.txt, only one can survive.

The table below shows the cases of ı, İ, ʻi, and ʻI.

language	`I`Lowercase	`i`Uppercase
English	`i`	`I`
Turkish language	`ı`	`İ`

If you touch ı.txt, İ.txt, ʻi.txt, ʻI.txt, ʻI.txt, İ.txt, ı.txt` survive.

Looking at unicode.org, lowercase ı without dots becomes normal ʻIwhen capitalized. Nevertheless, on APFS,ı.txt and ʻI.txt are considered to be different names.

Correspondence in various environments

I tried to see what happens in some environments.

shell script (bash) etc.

The result is in the following environment:

shell script(bash)
C(POSIX glob)
Go(filepath.Glob)
Python3(glob.glob)
PHP7(glob)

It's the same as POSIX glob and bash, so this seems to be basic, but it's rather unpleasant.

Basically,

Case insensitive without wildcards.
Case sensitive if there are wildcards.

It has become. It was quite surprising that Foo.txt would match 1 case and F * .txt would match 0 cases.

As far as I've noticed, I have the same opinion as the file system about the part without wildcards.

F*/f*/Foo

wildcard	foo.txt	fred.txt
F*.txt	❌	❌
f*.txt	✅	✅
Foo.txt	✅	❌

i / I / ı / İ

wildcard	i-lat-lo.txt	I-lat-up.txt	ı-tur-lo.txt	İ-tur-up.txt
i*.txt	✅	❌	❌	❌
I*.txt	❌	✅	❌	❌
ı*.txt	❌	❌	✅	❌
İ*.txt	❌	❌	❌	✅
İ-tur-lo.txt	❌	❌	❌	❌
I-tur-lo.txt	❌	❌	❌	❌
ı-lat-up.txt	❌	❌	❌	❌

Ǳ / ǲ / ǳ

wildcard	Ǳ-uu.txt	ǲ-ul.txt	ǳ-ll.txt
Ǳ*.txt	✅	❌	❌
ǲ*.txt	❌	✅	❌
ǳ*.txt	❌	❌	✅
ǳ-uu.txt	✅	❌	❌
ǳ-ul.txt	❌	✅	❌
ǲ-uu.txt	✅	❌	❌

ruby(Dir.glob)

The behavior of ruby is quite different from POSIX glob.

Basically, it seems to be consistent with the operation of "case insensitive". With Foo.txt, only foo.txt matches, and with F * .txt, foo.txt and fred.txt match. Easy to understand.

But,

`ruby`


Dir.glob("ǳ-u*.txt") #=> []
Dir.glob("ǳ-uu.txt") #=> ["files/Ǳ-uu.txt"]

There is also a pattern that the number of matches decreases when a wild card is inserted. bug?

F*/f*/Foo

wildcard	foo.txt	fred.txt
F*.txt	✅	✅
f*.txt	✅	✅
Foo.txt	✅	❌

i / I / ı / İ

wildcard	i-lat-lo.txt	I-lat-up.txt	ı-tur-lo.txt	İ-tur-up.txt
i*.txt	✅	✅	❌	❌
I*.txt	✅	✅	❌	❌
ı*.txt	❌	❌	✅	❌
İ*.txt	❌	❌	❌	✅
İ-tur-lo.txt	❌	❌	❌	❌
I-tur-lo.txt	❌	❌	❌	❌
ı-lat-up.txt	❌	❌	❌	❌

Ǳ / ǲ / ǳ

wildcard	Ǳ-uu.txt	ǲ-ul.txt	ǳ-ll.txt
Ǳ*.txt	✅	❌	❌
ǲ*.txt	❌	✅	❌
ǳ*.txt	❌	❌	✅
ǳ-uu.txt	✅	❌	❌
ǳ-ul.txt	❌	✅	❌
ǲ-uu.txt	✅	❌	❌

Java(PathMatcher)

There is an interface called PathMatcher in java.nio.file, so I tried using it. This is also quite different from POSIX glob. It always seems to be case sensitive. It behaves differently than the filename in the file system, but is consistent.

F*/f*/Foo

wildcard	foo.txt	fred.txt
F*.txt	❌	❌
f*.txt	✅	✅
Foo.txt	❌	❌

i / I / ı / İ

wildcard	i-lat-lo.txt	I-lat-up.txt	ı-tur-lo.txt	İ-tur-up.txt
i*.txt	✅	❌	❌	❌
I*.txt	❌	✅	❌	❌
ı*.txt	❌	❌	✅	❌
İ*.txt	❌	❌	❌	✅
İ-tur-lo.txt	❌	❌	❌	❌
I-tur-lo.txt	❌	❌	❌	❌
ı-lat-up.txt	❌	❌	❌	❌

Ǳ / ǲ / ǳ

wildcard	Ǳ-uu.txt	ǲ-ul.txt	ǳ-ll.txt
Ǳ*.txt	✅	❌	❌
ǲ*.txt	❌	✅	❌
ǳ*.txt	❌	❌	✅
ǳ-uu.txt	❌	❌	❌
ǳ-ul.txt	❌	❌	❌
ǲ-uu.txt	❌	❌	❌

C#(.NET Core / Directory.GetFiles)

Similar to ruby's movement. Unlike ruby, Ǳ * .txt matches ǳ-ll.txt properly (?). However, on the contrary, Ǳ-uu.txt cannot be obtained with ǳ-uu.txt.

F*/f*/Foo

wildcard	foo.txt	fred.txt
F*.txt	✅	✅
f*.txt	✅	✅
Foo.txt	✅	❌

i / I / ı / İ

wildcard	i-lat-lo.txt	I-lat-up.txt	ı-tur-lo.txt	İ-tur-up.txt
i*.txt	✅	✅	❌	❌
I*.txt	✅	✅	❌	❌
ı*.txt	❌	❌	✅	❌
İ*.txt	❌	❌	❌	✅
İ-tur-lo.txt	❌	❌	❌	❌
I-tur-lo.txt	❌	❌	❌	❌
ı-lat-up.txt	❌	❌	❌	❌

Ǳ / ǲ / ǳ

wildcard	Ǳ-uu.txt	ǲ-ul.txt	ǳ-ll.txt
Ǳ*.txt	✅	✅	✅
ǲ*.txt	✅	✅	✅
ǳ*.txt	✅	✅	✅
ǳ-uu.txt	❌	❌	❌
ǳ-ul.txt	❌	❌	❌
ǲ-uu.txt	❌	❌	❌

C#(Mono / Directory.GetFiles)

Surprisingly, .NET Core and Mono behave differently. I feel like I'm losing to the letters ǲ, which are neither uppercase nor lowercase.

F*/f*/Foo

wildcard	foo.txt	fred.txt
F*.txt	✅	✅
f*.txt	✅	✅
Foo.txt	✅	❌

i / I / ı / İ

wildcard	i-lat-lo.txt	I-lat-up.txt	ı-tur-lo.txt	İ-tur-up.txt
i*.txt	✅	✅	❌	❌
I*.txt	✅	✅	❌	❌
ı*.txt	❌	❌	✅	❌
İ*.txt	❌	❌	❌	✅
İ-tur-lo.txt	❌	❌	❌	❌
I-tur-lo.txt	❌	❌	❌	❌
ı-lat-up.txt	❌	❌	❌	❌

Ǳ / ǲ / ǳ

wildcard	Ǳ-uu.txt	ǲ-ul.txt	ǳ-ll.txt
Ǳ*.txt	✅	❌	✅
ǲ*.txt	❌	✅	❌
ǳ*.txt	✅	❌	✅
ǳ-uu.txt	✅	❌	❌
ǳ-ul.txt	❌	❌	❌
ǲ-uu.txt	❌	❌	❌

Perl(glob)

It behaves much like POSIX glob, but treats lowercase i without dots differently.

F*/f*/Foo

wildcard	foo.txt	fred.txt
F*.txt	❌	❌
f*.txt	✅	✅
Foo.txt	✅	❌

i / I / ı / İ

wildcard	i-lat-lo.txt	I-lat-up.txt	ı-tur-lo.txt	İ-tur-up.txt
i*.txt	✅	❌	❌	❌
I*.txt	❌	✅	❌	❌
ı*.txt	❌	❌	✅	❌
İ*.txt	❌	❌	❌	✅
İ-tur-lo.txt	❌	❌	❌	❌
I-tur-lo.txt	❌	❌	✅	❌
ı-lat-up.txt	❌	✅	❌	❌

Ǳ / ǲ / ǳ

wildcard	Ǳ-uu.txt	ǲ-ul.txt	ǳ-ll.txt
Ǳ*.txt	✅	❌	❌
ǲ*.txt	❌	✅	❌
ǳ*.txt	❌	❌	✅
ǳ-uu.txt	✅	❌	❌
ǳ-ul.txt	❌	✅	❌
ǲ-uu.txt	✅	❌	❌

Summary

POSIX glob has the same opinion as the file system about the part without wildcards, but it is difficult to understand that it becomes case-sensitive when wildcards are included.

Go(filepath.Glob)
Python3(glob.glob)
PHP7(glob)

Has the same opinion as POSIX glob.

on the other hand

ruby(Dir.glob)
Java(PathMatcher)
C#(.NET Core GetFiles)
C#(Mono Directory.GetFiles)
Perl(glob)

Seems to be processing with its own algorithm and returns different results than POSIX glob. It tends to be disturbing around "a set of two letters of the alphabet that can be capitalized only for the first letter" and "a lowercase i with dots removed".

[JAVA] The unfortunate world of case-insensitive wildcards (macOS)

About this article

Being case insensitive

bash

Correspondence in various environments

shell script (bash) etc.

ruby

Summary

`bash`

`ruby`