Skip to content

Commit

Permalink
Make the main binary a Unicode application on Windows
Browse files Browse the repository at this point in the history
This commit uses "wide" (i.e. UTF-16) commandline arguments when launching the
program on Windows, and uses the "wide" Windows APIs when dealing with strings.
This makes it independent from the currently selected active code page on the
system, and hopefully fixes issues with "nonstandard" characters appearing in
file and directory paths.

Fixes #172.
  • Loading branch information
lighterowl committed Feb 4, 2020
1 parent 9ffe337 commit 92246c3
Show file tree
Hide file tree
Showing 8 changed files with 73 additions and 15 deletions.
1 change: 1 addition & 0 deletions libmediation/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ add_library(mediation STATIC
)

IF(WIN32)
target_compile_definitions(mediation PRIVATE "-DUNICODE")
target_sources(mediation PRIVATE fs/osdep/file_win32.cpp)
ELSE()
target_sources(mediation PRIVATE fs/osdep/file_unix.cpp)
Expand Down
19 changes: 10 additions & 9 deletions libmediation/fs/directory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ bool createDir(const std::string& dirName, bool createParentDirs)
parentDir == string("\\\\.\\") || // UNC patch prefix
(strStartWith(parentDir, "\\\\.\\") && parentDir[parentDir.size() - 1] == '}')) // UNC patch prefix
continue;
if (CreateDirectory(parentDir.c_str(), 0) == 0)
if (CreateDirectory(toWide(parentDir).data(), 0) == 0)
{
if (GetLastError() != ERROR_ALREADY_EXISTS)
return false;
Expand All @@ -110,7 +110,7 @@ bool createDir(const std::string& dirName, bool createParentDirs)
#if __linux__ == 1 || (defined(__APPLE__) && defined(__MACH__))
return mkdir(dirName.c_str(), S_IREAD | S_IWRITE | S_IEXEC) == 0;
#elif defined(_WIN32)
return CreateDirectory(dirName.c_str(), 0) != 0;
return CreateDirectory(toWide(dirName).data(), 0) != 0;
#endif
}

Expand All @@ -119,7 +119,7 @@ bool deleteFile(const string& fileName)
#if __linux__ == 1 || (defined(__APPLE__) && defined(__MACH__))
return unlink(fileName.c_str()) == 0;
#else
if (DeleteFile(fileName.c_str()))
if (DeleteFile(toWide(fileName).data()))
{
return true;
}
Expand All @@ -129,7 +129,7 @@ bool deleteFile(const string& fileName)
return false;
}

return DeleteFile(fileName.c_str()) != 0;
return DeleteFile(toWide(fileName).data()) != 0;
#endif
}

Expand All @@ -144,16 +144,16 @@ bool findFiles(const string& path, const string& fileMask, vector<string>* fileL
WIN32_FIND_DATA fileData; // Data structure describes the file found
HANDLE hSearch; // Search handle returned by FindFirstFile

hSearch = FindFirstFile(TEXT(string(path + '/' + fileMask).c_str()), &fileData);
auto searchStr = toWide(path + '/' + fileMask);
hSearch = FindFirstFile(searchStr.data(), &fileData);
if (hSearch == INVALID_HANDLE_VALUE)
return false;

do
{
if (!(fileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
string fileName(fileData.cFileName);

auto fileName = toUtf8(fileData.cFileName);
fileList->push_back(savePaths ? (path + '/' + fileName) : fileName);
}
} while (FindNextFile(hSearch, &fileData));
Expand All @@ -168,15 +168,16 @@ bool findDirs(const string& path, vector<string>* dirsList)
WIN32_FIND_DATA fileData; // Data structure describes the file found
HANDLE hSearch; // Search handle returned by FindFirstFile

hSearch = FindFirstFile(TEXT(string(path + "*").c_str()), &fileData);
auto searchStr = toWide(path + "*");
hSearch = FindFirstFile(searchStr.data(), &fileData);
if (hSearch == INVALID_HANDLE_VALUE)
return false;

do
{
if (!(fileData.dwFileAttributes ^ FILE_ATTRIBUTE_DIRECTORY))
{
string dirName(fileData.cFileName);
auto dirName = toUtf8(fileData.cFileName);

if ("." != dirName && ".." != dirName)
dirsList->push_back(path + dirName + "/");
Expand Down
7 changes: 4 additions & 3 deletions libmediation/fs/osdep/file_win32.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ File::File(const char* fName, unsigned int oflag, unsigned int systemDependentFl
else
systemDependentFlags = FILE_FLAG_RANDOM_ACCESS;
}
m_impl = CreateFile(fName, dwDesiredAccess, dwShareMode, NULL, dwCreationDisposition, systemDependentFlags, NULL);
m_impl = CreateFile(toWide(fName).data(), dwDesiredAccess, dwShareMode, NULL, dwCreationDisposition,
systemDependentFlags, NULL);
if (m_impl == INVALID_HANDLE_VALUE)
{
throwFileError();
Expand Down Expand Up @@ -100,8 +101,8 @@ bool File::open(const char* fName, unsigned int oflag, unsigned int systemDepend
}

createDir(extractFileDir(fName), true);

m_impl = CreateFile(fName, dwDesiredAccess, dwShareMode, NULL, dwCreationDisposition, systemDependentFlags, NULL);
m_impl = CreateFile(toWide(fName).data(), dwDesiredAccess, dwShareMode, NULL, dwCreationDisposition,
systemDependentFlags, NULL);
if (m_impl == INVALID_HANDLE_VALUE)
{
return false;
Expand Down
27 changes: 27 additions & 0 deletions libmediation/types/types.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -488,3 +488,30 @@ uint32_t random32()
static std::minstd_rand raand(dev());
return static_cast<std::uint32_t>(raand());
}

#ifdef _WIN32
#include <windows.h>

std::vector<wchar_t> toWide(const std::string& utf8Str) { return toWide(utf8Str.c_str(), utf8Str.size()); }

std::vector<wchar_t> toWide(const char* utf8Str, int sz)
{
auto requiredSiz = MultiByteToWideChar(CP_UTF8, 0, utf8Str, sz, nullptr, 0);
std::vector<wchar_t> multiByteBuf(static_cast<std::size_t>(requiredSiz));
MultiByteToWideChar(CP_UTF8, 0, utf8Str, sz, multiByteBuf.data(), requiredSiz);
if (multiByteBuf.empty() || multiByteBuf.back() != 0)
{
multiByteBuf.push_back(0);
}
return multiByteBuf;
}

std::string toUtf8(const wchar_t* wideStr)
{
auto needed = WideCharToMultiByte(CP_UTF8, 0, wideStr, -1, nullptr, 0, nullptr, nullptr);
needed--; // includes terminating null byte, needless when returning a std::string.
std::string s(static_cast<std::size_t>(needed), 0);
WideCharToMultiByte(CP_UTF8, 0, wideStr, -1, &s[0], needed, nullptr, nullptr);
return s;
}
#endif
6 changes: 6 additions & 0 deletions libmediation/types/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,10 @@ static uint32_t FOUR_CC(uint8_t a, uint8_t b, uint8_t c, uint8_t d)
return my_ntohl((uint32_t(a) << 24) + (uint32_t(b) << 16) + (uint32_t(c) << 8) + uint32_t(d));
}

#ifdef _WIN32
std::vector<wchar_t> toWide(const char*, int sz = -1);
std::vector<wchar_t> toWide(const std::string&);
std::string toUtf8(const wchar_t*);
#endif

#endif //__T_TYPES_H
3 changes: 2 additions & 1 deletion tsMuxer/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cmake_minimum_required (VERSION 3.1)
project (tsmuxer LANGUAGES CXX)

add_executable (tsmuxer
add_executable (tsmuxer WIN32
aac.cpp
aacStreamReader.cpp
AbstractDemuxer.cpp
Expand Down Expand Up @@ -104,6 +104,7 @@ target_link_libraries(tsmuxer mediation ${THREADSLIB} ${ZLIB_LIBRARIES})
if (WIN32)
target_sources(tsmuxer PRIVATE osdep/textSubtitlesRenderWin32.cpp)
target_link_libraries(tsmuxer gdiplus)
target_compile_definitions(tsmuxer PRIVATE "-DUNICODE")
else()
target_sources(tsmuxer PRIVATE osdep/textSubtitlesRenderFT.cpp)
target_link_libraries(tsmuxer ${FREETYPE2_LIBRARIES})
Expand Down
21 changes: 21 additions & 0 deletions tsMuxer/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -547,8 +547,29 @@ All parameters in this group start with two dashes:\n\
LTRACE(LT_INFO, 2, help);
}

#ifdef _WIN32
#include <shellapi.h>
int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
int argc;
auto argvWide = CommandLineToArgvW(GetCommandLineW(), &argc);
std::vector<std::string> argv_utf8;
argv_utf8.reserve(static_cast<std::size_t>(argc));
for (int i = 0; i < argc; ++i)
{
argv_utf8.emplace_back(toUtf8(argvWide[i]));
}
LocalFree(argvWide);
std::vector<char*> argv;
argv.reserve(argv_utf8.size());
for (auto&& s : argv_utf8)
{
argv.push_back(&s[0]);
}
#else
int main(int argc, char** argv)
{
#endif
LTRACE(LT_INFO, 2, "tsMuxeR version " TSMUXER_VERSION << ". github.com/justdan96/tsMuxer");
int firstMplsOffset = 0;
int firstM2tsOffset = 0;
Expand Down
4 changes: 2 additions & 2 deletions tsMuxerGUI/tsmuxerwindow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1176,7 +1176,7 @@ void splitLines(const QString &str, QList<QString> &strList)

void TsMuxerWindow::addLines(const QByteArray &arr, QList<QString> &outList, bool isError)
{
QString str = QString::fromLocal8Bit(arr);
QString str = QString::fromUtf8(arr);
QList<QString> strList;
splitLines(str, strList);
QString text;
Expand Down Expand Up @@ -2398,7 +2398,7 @@ bool TsMuxerWindow::saveMetaFile(const QString &metaName)
msgBox.exec();
return false;
}
QByteArray metaText = ui->memoMeta->toPlainText().toLocal8Bit();
QByteArray metaText = ui->memoMeta->toPlainText().toUtf8();
file.write(metaText);
file.close();
return true;
Expand Down

15 comments on commit 92246c3

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiled via msys2
image

@lighterowl
Copy link
Contributor Author

@lighterowl lighterowl commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, console output will be borked - this is a separate issue. Things will (or, rather, should) work fine if you use the GUI. You can also try setting the MinGW console to UTF-8 and see if that fixes stuff, but keep in mind that it might break other applications that you run in that console.

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopped running from Win10 64bit
image

@lighterowl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually expected - arguments in the standard console window are encoded using the ACP by the default, so there is no way to pass characters outside your ACP (which I'm now sure is Windows-1251 after seeing "Корпорация Майкрософт") even if the application is Unicode-aware.

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is bad \8^(
image

@lighterowl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look at this later on - have to admit I didn't even think of testing with the standard console, which is known to cause issues exactly like this. I don't think chcp is the answer here, but I might be wrong.

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually expected - arguments in the standard console window are encoded using the ACP by the default, so there is no way to pass characters outside your ACP (which I'm now sure is Windows-1251 after seeing "Корпорация Майкрософт") even if the application is Unicode-aware.

Are you sure?
image

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your commit got into the nightly build it would be good to roll back and rebuild it because tsmuxer is used often from the console

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsMuxerGUI.exe from w64-nightly-2020-02-05--01-09-49.zip works great!
image
but tsMuxeR.exe from w64-nightly-2020-02-05--01-09-49.zip doesn't even start:
image
image

@qyot27
Copy link
Contributor

@qyot27 qyot27 commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI is starting, the output just isn't being given to cmd.exe. -mwindows is probably getting set somewhere when building the CLI, and that needs to be removed or appended with -mconsole (whichever is true, -mconsole needs to be last in the linker command). It's also a little sneaky because the dependencies can pop -mwindows into the linker (SDL is/was notorious for this, for example).

@qyot27
Copy link
Contributor

@qyot27 qyot27 commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namely, the lack of console output was probably directly linked to adding WIN32 to add_executable in tsmuxer/CMakeLists.txt. According to CMake documentation, passing WIN32 to add_executable sets WIN32_EXECUTABLE, and (via https://cmake.org/cmake/help/latest/prop_tgt/WIN32_EXECUTABLE.html#prop_tgt:WIN32_EXECUTABLE):

 This makes it a GUI executable instead of a console application.

Passing -municode into target_compile_options and not passing WIN32 to add_executable might be the solution. In other apps where it's not possible to control whether the dependencies throw -mwindows into the linker command, the solution is just to brute-force -mconsole to the end of the command, since the last one will take precedence.

@lighterowl
Copy link
Contributor Author

@lighterowl lighterowl commented on 92246c3 Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole problem originates in WinMain, because I - for some reason - decided that it's necessary to use it in Windows Unicode builds instead of the standard int main in order to have access to "wide" parameters, but this is actually not true. Since the command line is received via GetCommandLineW() anyway, it does not matter which entry point is used for the application.
So, yes, the solution is to remove WIN32 (without which I was - understandably - getting linking errors on MinGW) and restore "standard" main as the program's main function, which makes the CRT properly allocate the console handles on Windows - as opposed to WinMain, when little initialisation is done by the CRT in order to simulate a standard console application. I'll take care of this later today.

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since now the tsMuxeR requires .meta in utf8, and many programs write it in 8-bit encoding, it is necessary ANNOUNCE this for save compatibility

@abakum
Copy link
Contributor

@abakum abakum commented on 92246c3 Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcdr428
Copy link
Collaborator

@jcdr428 jcdr428 commented on 92246c3 Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xavery for the commit to work, I've had to add at the top of file_win32.cpp and directory.cpp:

#ifndef UNICODE
#define UNICODE
#endif

Could you please advise if any alternative.

EDIT : sorry, forget about this... for whatever reason CMakeLists.txt was not referenced anymore in Visual Studio.

Please sign in to comment.