Summary Here's how to remove pesky Unicode byte-order marks added by Visual Studio.
Visual Studio isn't a bad development environment, but it has its faults. One in particular that irks me is the Visual Studio behavior of inserting a Unicode byte-order mark (or BOM) at the beginning of each file. When these are opened by my colleagues in other locations with different regional settings, their Visual Studio settings may update the file to remove this mark.
This is problematic. When my colleagues look at their changes in version control, they'll see a change in this file, although they won't remember editing that file at all. Worse still, since the RFC specification, RFC 3629, calls for BOM to be silently ignored if it's not needed, it won't be obvious in some diff viewers what has actually changed.
What would be ideal is to have a configurable, default file encoding. Alas, this is not possible to do. Instead, I use the following script as part of my Save and Save All macros in VS.
# Removes all Unicode BOMs from files, ignoring any files in *.git directories, and converts Windows CRLF to Unix LF.
# distilledb.com/blog
find . -path '*/.git/*' -prune -o -type f -print | \
while read line; \
do \
xxd -p "$line" | grep -q "efbbbf" && echo "$line"; \
done | \
while read line; \
do \
echo [[[ $line; \
dd if="$line" of="$line.result" ibs=3 skip=1; \
diff "$line" "$line.result"; \
dos2unix -v "$line.result"; \
mv -v "$line.result" "$line"; \
echo ]]]; \
done
A few notes:
*. Remove the mv $line.result $line line to do a dry run that generates *.result files which you can inspect to verify that the appropriate changes have been made. Then find . -name '*.result' -print -exec rm {} + to clean up.
* You can remove the -v flags on the dos2unix -v and mv -v lines to suppress the verbose output.
* Replace *.git with *.svn in the first line if you use Subversion for version control.
* Remove the dos2unix line to avoid stripping line breaks as well.
Additional reading
See here for information about Unicode BOMs.


