Porting Open Source Project to VC++

This is my attempt to document frustrations a developer on the commercial side of the tracks experiences when dealing with the open source projects. It is planned as a day-by-day account - but I am not sure if I will continue these notes beyond first few days - as time is short - and the problems are more specific by the day.

Anyone who ever tried to port a typical Open Source project knows well that open source code is very portable as long as the compiler is a particular version of the GCC on x86, and the OS is some kind of Unix

For reasons I am not going to go into here, I needed to port a fairly large open source project to Visual C++. I decided to try to keep a diary, so that (hopefully) some open source guru can finally see the portability issue from the non-GCC / Unix angle.

11/6/03. Downloaded the code. The readme clearly states that there is no way to compile the project under MSVC, "so don't even ask". Cursory review of the source located the first major problem - a huge (5000 lines) shell script produced by autoconf  that essentially does the following things:

In a typical non-open source project, I would have simply faced a couple of small configuration files to edit - and in an hour I would be on my way. Here I am in big trouble - there is no file to edit - as it is generated by the script generated by the autoconf using another file (configure.in) - and I cannot even find a full list of all configuration defines - or any English-language descriptions of these. Simply executing the script does not help, as I do not know which one of the 4 supported compilers is the best starting point. There are no meaningful (i.e., non-local) comments among the 5000 lines of the script. Note that the final result of the script contains less than 100 lines - if not for the need to automatically compile under hundreds of similar but different configurations, I would need to read 50 times less code

To those of you who say that I do not need to read the configure script, as a relatively short configure.in should be enough, I can only respond, that:

I decided to spend an hour or so on a brute force attack - replace the file with its prototype (practically no defines) and see what breaks.

Started with an automatically generated Win32 console project (Visual Studio .NET 2003) and manually added the files into the tree.

First modifications were trivial change main() to _tmain(), add stdafx.h. Had to tweak the precompiled headers setting.

11/6/03. Need unistd.h and dirent.h files. Borrowed from Mingw32 project. Enabled the runtime types, and most of the files successfully compile ☺ Brute force works - abandoned any ideas of tweaking the build scripts.

First bug in the original code: missing config.h include in a trivial file. Under some configurations it compiles, under some - will not. Next, minor compiler quirk - MSVC does not like functions with a result, but without return operator. Well, some functions in the original code end in assert() ☺. 

First real  issue: popen() function to pipe output of one program into another. Replaced with _popen(). Another one, strcasecmp(). Replaced by an equally nonstandard _stricmp().

Another bug in original code - instead of XYZ::abc an XYZ::XYZ::abc is used. Apparently GCC forgives this...

All C++ files written by humans now compile. Now, going for Lex and Yacc ... will try to use Cygwin first. Bison and flex appear to work.

A full hour spent on non-methodical attempts to find a dirent.c file that matches dirent.h while the latter still compiles with both C and C++.

Another funny function isatty() ☺. Got rid of it by changing the lexer defines to non-interactive.

Everything links - and I got "Hello world" running! Not bad for a few spare hours of work ... The quality and portability of the code is amazingly good.

11/7/03. Digging deeper, found that the code can be compiled by any YACC-like and LEX-like utilities, as long as their names are bison and flex respectively. So much for the actual value of the configuration script

First big problem: as usual, the Unix source does its own filename parsing - as a result, filenames with spaces do not work. Simple workaround used in this code is to always use the 8.3 naming convention. Unfortunately, not all the places handle this accurately. The quick route around is to do nothing and use 8.3 convention myself - otherwise, major surgery is in order - which is inevitable anyhow at least to get normal diagnostics for "file not found" messages and ability to install the program in \Program Files. It is amusing to see how open source folks make fun of the MICROS~1 legacy (that MS no longer uses in its own development) - and use this legacy at the same time. Wouldn't it be great if K&R have thought about filename parsing API - or somebody have added this API  to Linux as a part of the innovation process? ☺ Filename parsing tips for Windows are around since 1995.

One more bug in the source: same macro is defined twice with different values in different include files, and used wrong on some occasions. Easy to spot in the actual include configuration files I use (50 lines or so), very hard in the file generation script (see above).

Now a really big problem: Code is liberally using STL c_str() functions whenever a C string is needed, and the STL string is available. The problem is that strings produced this way are of very limited use in a typical STL implementation, as  the C string is allocated somewhere by the string class - certainly it is a bad idea to keep the result around for future use - and the code returns it from functions - making the chain untraceable - with predictable random bad results. The express prohibition of this practice can be found in http://gcc.gnu.org/onlinedocs/gcc/Temporaries.html .

11/7/03. Fixed this and host of other smaller issues - port is now apparently operational - some non-trivial tests passed.