Gavin's Computer Technology Blog: C++ in an NT Driver

Guest Article: C++ in an NT Driver

The NT Insider, Vol 14, Issue 2, March - April 2007 | Published: 20-Apr-07| Modified: 20-Apr-07

By Edouard Alligand

This may sound as an oxymoron, but an ever increasing number of programmers are willing to try C++ in kernel mode. Why would you use C++?

The primary duty of a driver is not to break anything. If you are successful in doing this, then you might be worthy enough to add some features to the kernel such as support for an USB pen that actually doesn't write well (but has a shiny blue light).

C++ is really about clarity and concision. It's not just C with a tumor waiting for a merciful doctor to end its suffering. But using it in kernel mode requires an intimate knowledge of the language and the kernel. Writing a driver in C++ is not supported by Microsoft at the moment: you're skiing off piste.

If you think C++ is from Hell and should be banned from this plane of existence, it is unlikely programming your driver in this language is going to offer you anything. But if you like its flexibility and wide range of features and would like to use them in kernel mode, read on.

Some Reading...
Prior to taking the big jump into C++ driver writing, if you are not already familiar with the following books, they are worth your attention:

Effective C++ and More Effective C++ by Scott Meyers

C++ Coding Standards, Exceptional C++ and More Exceptional C++ by Herb Sutter

Modern C++ Design by Andrei Alexandrescu

The C++ Standard Library by Nicolai M. Josuttis

The last reference is not really needed for kernel mode programming, but will certainly improve your C++ skills. Ideally, read all of them, and many others! Several C++ mechanisms which are all-too briefly discussed in this paper are explained thoroughly in these books.

The Easy Way to C++
Let's get back at our topic. The features that are the easiest to use in kernel mode are:

C++ cast operators, except dynamic_cast<>

The bool type

References

Namespaces

Const (please use and abuse the const keyword! See Item 3 of Effective C++)

In place declarations and affectations

C++ like comments ("//")

Note: Namespaces, references and const might incur linking problems, but this is a class of issues that can be handled much more easily than the ones we are going to study.

This "Super-C" or "enhanced-C" can be used directly without any additional work. Since such programs can easily be converted to "pure-C" programs, it's easy to understand why the code works.

That is the only C++ one can safely recommend. Kindly refrain from using any more advanced features unless you are ready to deal with the consequences.

More C++ in Your Kernel
One of the main problems with C++ in the kernel is that most of the "nice" features of the language are not directly available in that mode. Some are easy to recreate and we will see how to do that. However, some features should be forgotten such as C++ exceptions, which are not the same as kernel exceptions.

Such features have to be forgotten simply because there is no support for them in kernel mode. Translation: does not compile. If you have the time and energy you may attempt to port them to kernel mode, but frankly, exceptions are too slow for kernel mode. This will have an impact on your C++ coding style, which is something you should keep in mind.

Dynamic Structure
The first problem you will confront is the dynamic memory allocation issue. Creating objects on the stack works well, but as you can guess, new and delete don't work anymore. That's simply because new calls malloc(), and, malloc() in turn ends up calling HeapAlloc(). So, you simply have to write a new and a delete so that they use ExAllocatePoolWithTag() and ExFreePoolWithTag().

But here comes an unexpected problem: should you allocate from non-paged pool or paged pool? The answer is: it depends. If you are absolutely certain that an object will never be used in a context in which page faults are forbidden then you can allocate from paged pool. Otherwise, you need to allocate from non-paged pool.

Avoid systematical allocation from the non-paged pool under the impression that "it's safer and easier". You might degrade the whole system performance by doing so.

There are several different ways to implement your new and delete. One way is to declare a base class with new and delete, as illustrated in the code snippet in Figure 1.

class PagedObject
{

public:
void *operator new (size_t lBlockSize)
{
PAGED_CODE();
return ExAllocatePoolWithTag(PagedPool, lBlockSize, YOURMEM_TAG);
}

void operator delete(void *p)
{

PAGED_CODE();

if (!p)
return;

ExFreePoolWithTag(p, YOURMEM_TAG);
}

};

Figure 1 - Declaring Base Classes New and Delete

After declaring these base classes, enhance your objects with the use of inheritance. A few words about this code snippet might be useful:

It is very important to allow the deletion of the NULL pointer since it is required by the standard. This can be done in checking the value of the provided pointer and returning without an error.

As you can see this class allocates object in the paged pool. If you need non-paged pool objects you can simply create a new class for this purpose or "templatize" this one.

This code will be inlined, but you can safely implement the functionality in a separate .cpp file.

It is vital that you check that your objects are properly allocated before using them. Do not assume that new "always works".

Remember that you cannot overload the global new and delete operator.

Now you can start firing up some MyObject *p = new MyObject(). However, prior to do so, let's improve our allocator a little bit. There are cases where you can really optimize memory allocation. One obvious case is a collection. In a collection, you allocate many objects of the same size and there is a better way than ExAllocatePoolWithTag() to do this.

Placement new operator
One day, you might wake up and say "I want some action, something intense and ugly" That's the day you'll write (or port) a C++ collection for kernel mode, say a linked list. There is a great benefit to having enhanced collections in kernel mode, but there is also a great risk. Collections quickly become central in any code, and a bug in a collection can be incredibly hard to trace and fix. That being said, using the basic new and delete in your collections isn't going to be efficient.

As you may know, C++ provides you with a placement new operator, which takes an additional argument. The placement new operator is generally used when you want to handle memory differently, and this is precisely what we want to do with our collection.

Let's see how we would write a placement new operator for our PagedObject in Figure 2.

class PagedObject
{

// ...
// our previous code

public:
void *operator new (size_t lBlockSize, void *pParam)
{
  PAGED_CODE();
  ASSERT(lBlockSize > 0);
  ASSERT(pParam != NULL);
  return pParam;
}

};

Figure 2 - Placement New Operator for PagedObject

As you can see, it does... well, nothing. You're probably wondering, "What's the point of a new that doesn't allocate memory?" Don't forget that when you write MyObject *p = new MyObject() you are directing the compiler to call the constructor once the memory allocation is successful. Generally, you want the constructor to be called unless you like quantum physics and undetermined states. Since there is no way to make an explicit call to the constructor, you have to use new.

Figure 3 shows an example of how to use the constructor.

// yourlookasidelist has to be properly initialized with
// ExInitializeNPagedLookasideList()
void *pMemory = ExAllocateFromNPagedLookasideList(&yourlookasidelist);
if (!pMemory) // C++ doesn't mean you don't have to handle errors anymore !
return STATUS_INSUFFICIENT_RESOURCES;
MyObject *p = new(pMemory)MyObject() ;
// note that there is no way on earth that p is NULL
// since the memory allocation has already be done so an ASSERT is welcomed
ASSERT(p != NULL);

Figure 3 - Example use of the Constructor

You have several alternatives to clean-up. For example, you can either write a placement delete or make an explicit call to the destructor (this is legal). In either case you need to call ExFreeToNPagedLookasideList() as shown in the following code snippet:

p->~MyObject();
ExFreeToNPagedLookasideList(&yourlookasidelist, p);

I would, however, encourage you to write a placement delete to preserve symmetry and readability. Now that you can do new and delete, you probably think everything is fine...Wrong.

We're Getting Static
Static objects. The work of the Devil. You don't have support for static objects anymore. Why? Well, when do you expect the constructors of your static objects to be called... mmm? DriverEntry? Tough luck, the compiler doesn't support that.

You could add the support, but let's think about other possibilities for a minute. There is a wonderful thing called the driver extension which will happily store all "global objects" for you. Unless you have a case where you cannot access your driver extension, this will fix the problem without writing the support code. This helps you kill two birds with one stone since storing your data in the driver extension encourages you toward making thread safe code (instead of having global variables you randomly call without bothering that they are global).

That's why, without the shadow of a doubt, you can store all your global objects in your driver extension and allocate them with your new operator in the driver entry. If you have several device objects you will hopefully figure out which one to use. If your driver supports unloading, you will delete them in the unloading routine.

The good news is that you don't have to bother where your pointers are because the driver extension sits in the non-paged pool. The other good news is that allocating your objects in the driver entry and releasing in the unloading routine contributes to reducing memory fragmentation compared to the alternative of allocating/releasing on demand.

A note of warning - Storing the global variables in the driver extension doesn't automatically make the access "thread safe". Without the proper locking mechanism, a multiple access to the same object will result in another Schrodinger's paradox. Unfortunately, there is no general rule. So only you and the driver developer can know if you need to lock your access.

Static fields are not a problem as long as they are not "objects". However, make sure static fields are in the "right section" (i.e. not in a pageable section if you need the data at high IRQLs).

Inheritance
Beware mortal, for inheritance can be your enemy. Indeed, it may hide some paged/nonpaged pool issues. If you instantiate an object that can be used at DISPATCH_LEVEL, and that object inherits from another object that is not safe at this level, you will win a windbg session for two, all expenses paid. Just kidding no expenses will be paid and no guests are allowed.

Fortunately, it is possible to protect against this kind of error with some imagination and rigor. If all your "safe" nonpaged pool objects inherit from NonPagedPoolObject, and all your paged pool objects inherit from PagedPoolObject, you can add a "static assertion" in the constructor of NonPagedPoolObject to ensure it does not inherit from a PagedPoolObject (the opposite is not a problem). We'll say more about static assertions later in this article.

Do not hesitate to add a redundant security with the PAGED_CODE() macro in all your methods that might directly or indirectly page. As a general rule, this macro is an excellent safety belt.

Virtual Functions
Fortunately, there's no scent of brimstone when working with virtual functions. They are easily managed in kernel mode. However, keep in mind that a vtable will be created if your object contains at least one virtual function. This vtable will increase the size of your object by at least the size of a pointer in your current architecture. If you are making a collection of 100,000 objects in non-paged pool, this is worth keeping in mind.

In addition, if you allocated your object in the paged pool, the vtable might incur a page fault during a function call. Don't forget it. Also, the general C++ rule applies if you have a single virtual function or if your object can be used as a base object, your destructor must be virtual.

Implement Your C++ Classes Correctly
An incorrect implementation of a C++ class can have some consequences on performances and stability. In other words, your design may be correct, but the way you implement it may not.

Class implementation has a few lurking concerns. For example, when working with temporary objects, you should be aware that the compiler may create temporary objects during operations such as:

Object A;
Object B;
Object C = A + B;

You can reduce the odds of temporary object creation in declaring a proper "+" operator. However, that is not all. You must also be able to handle assignment to self in the affectation operator. You also have to make sure your constructors will only be used properly. For example, don't hesitate to use the explicit keyword.

There are many other issues we could consider, but you will find many of them in the books listed in the introduction. Nevertheless, it's important to remember that what can be problematic in user mode becomes catastrophic in kernel mode.

Templates and Inlined Code
As you may know, there is still no full support for templates by the Microsoft Compiler, or any compiler for that matter. This means that most of the time you will have the templates' implementation in the header file. That implies that the template code will be inlined except null or forbidden by the law.

Judge Compiler is the law. And it might say: "I'm not going to inline this code but I'm going to compile it somewhere else in the binary, add a function call, and ni vu ni connu. This applies to any "inlineable" code. The question is, "Where is it going to be compiled?" The answer is that in theory anywhere, including in a paged code section. Since the #pragma_seg keyword wasn't designed with C++ in mind, you cannot use it to force these code thunks to reside in a specific place in memory.

Let's imagine you are running at DISPATCH_LEVEL and you make a call to your template function, page fault. Page faults at DISPATCH_LEVEL are extremely well handled by the kernel in a scheme that it technically called "die Betriebssystemskernsvernichtung". In other words: BSOD.

Nobody can guarantee that code thunks are always going to end up in non-paged sections. By disassembling your driver you can check that everything is fine, which is good to do in the early stages of development. For the moment, there is no report of generation in a paged section, but you could be the first. In addition, you have to be very careful when switching to a new architecture.

The previous thought brings up a useful point. Disassembling your driver is an excellent exercise to understand the implications of C++ in kernel mode. You will learn more by disassembling your driver than by reading 100 articles about programming.

A feature that can be risky to use in kernel mode is template meta-programming. The problem with meta-programming is that the debugger will not be of much help. Debugging a driver is hard enough, so don't overburden it. On the other hand, meta-programming can significantly reduce the amount of code required to implement a feature, which reduces the odds for a bug. It will also increase your success with women.

Stack Me Up
The stack is precious. The stack is rare. As a kernel developer you must worship the God of Stack and never abuse It. You might ask, "What does this have to do with C++?" Well... this is what :

void myfunction(void)
{
CMyEliteObject MyEliteObject;
CMyUltraUntzObject MyUltraUntzObject;
CMyOlalala MyOlalala;
// ad nauseam
}

Don't get me wrong, this is good code, especially because it can be a way to prevent resource leaking if the constructors and destructors take charge of the acquisition and release. However, if your objects have numerous fields, you can easily exhaust the stack. To prevent this you can use dynamic memory allocation for the biggest objects. If you fear you might forget a delete, you can rewrite on auto_ptr facsimile.

Smart Pointers
Smart pointers are a great way to reduce memory leaks. It is very tempting to implement a smart pointer in kernel mode. To start, you can use smart pointers only on your objects. There will be no bad surprises since you rewrote the new and delete operators. However, be extremely careful when your pointers have to graduate to "the outside world". Only your driver understands the smart pointer "the outside world" doesn't.

Smart pointers can also be a way to limit the number of new/delete operations. However, a related pitfall is that your smart pointer might not be able to account all the reference counts.

For example, look at Figure 4.

void SomeDispatchWriteRoutine ()
{
// some code...
SmartPointer<MyEliteResource> spMyEliteResource;
IoCopyCurrentIrpStackLocationToNext(Irp);
IoSetCompletionRoutine(Irp,
  SomeWriteCompletionRoutine,
  spMyEliteResource,
  TRUE, TRUE, TRUE);
return IoCallDriver(pDeviceExtension->pLowerDeviceObject, Irp);
}

Figure 4 - Use of Smart Pointers...What's Wrong Here?

The problem in Figure 4 is that it's creating an access violation. Why? When you leave the scope of SomeDispatchWriteRoutine, the smart pointer is going to free the object. When the completion routine is subsequently called, it will be left with a dangling pointer. There was no additional reference taken when the smart pointer was passed to IoSetCompletionRoutine() since it takes a PVOID as argument.

There are at least two solutions. The simple one is to avoid smart pointers when talking with the system functions. Another one is to take an extra reference before passing the smart pointer to IoSetCompletionRoutine() and make the completion routine "smart pointer aware" so that it removes this reference. In this case the advantage of smart pointers is sharply reduced, so the first solution is a good choice.

Do Drivers Dream of Electric C++ Libraries?

The Standard Template Library
The STL will not work in kernel mode and cannot be used. Dang, it's not even in the crt directory of the WDK.

However the temptation to use the STL in kernel mode is almost irresistible. The STL is the absolute weapon of the C++ programmer. The algorithms/containers paradigm would be an incredible benefit in kernel mode, especially for file system-related drivers, notwithstanding its maturity and stability. Think how simple and reliable those complex file system routines would become.

But let's not get our hopes too high:

1. The STL throws exceptions
2. The STL allocates memory.

Porting the whole STL to the WDK is pointless and very hard. However reasonable subsets of the STL could profitably be ported.

The Boost Library
The boost library, another widespread C++ tool, relies heavily on the STL. However, some parts could be ported (bis repetita). We immediately think about are the generic programming modules (static assertions, type traits, operators...), the smart pointer classes, algorithms, containers. Why do C_ASSERT() when you can do BOOST_STATIC_ASSERT()?

Other Libraries
It is actually not a C++ issue, but the need to use in kernel mode a library made for user mode is frequent. One can immediately think about compression and cryptography, but truly, the spectrum is much wider.

Keep in mind that if your library throws an exception, or in one way or another allocates memory, then using it in kernel mode will be painful. Actually linking that eliteness.lib library and see if it works is really the way to go, and by really the way to go we mean that it's going to blow in your face.

Speed, Performances...
Writing your driver in C++ should not hinder performance. However, when you are within a critical path, you have to be careful about:

Number of memory allocations you do

Hidden function calls such as constructors, destructors, adjustor thunks...

Encapsulated processing - We're not talking about the thunks, but the code that is encapsulated by your objects, that makes it not necessarily obvious.

In kernel mode there are some cases where it's very important to execute as fast as possible. The first example that comes to the mind is of course an ISR. When writing your ISR, make sure there is no hidden unnecessary resource allocation and that the compiler isn't going to write another hidden call to SETI@Home

Talking to the Outside World
The interface of your driver must work with drivers written in C. Your driver must not expect other drivers to pass it objects or complex structures that are only available to C++. The example given above for a smart pointer exhibits how wrong things can go if you assume everybody understands your classes.

It might also sound quite tempting to make your user mode program send C++ objects to your driver, but this is not going to work easily.

Your Friends
Dynamic assertions, static assertions, the Driver Verifier, and a checked build are great friends and are not restricted to C++ (this list is by no means exhaustive). The proper use of these tools will quickly tell you if what you are doing is unholy. When implementing a new C++ profanity it is a good reflex to pass it through a heavy testing chain.

If you write collections or similar basic tools, you need to exhaustively test them. That means, you must have a test driver that does all the possible insertions, removals, or whatever else is possible in all possible contexts before considering your routines viable for live testing. Also, remember that there is no such thing as single threading in kernel mode.

C++ is more strongly typed than C. Because of this, it is possible to detect many errors at compile time, such as the abuse of the const keyword, generic programming, and static assertions.

A Question of Architecture
It is important to remember that Windows NT is a multi-platform system. Since the introduction of the x64 architecture, driver developers have to set their mind in a "multi-platform" state even further. The behavior of the compiler on the x86 and x64 is different, but the most radical changes may come from the IA64 compiler. When recompiling a C++ driver for a different architecture, it is strongly advised to revalidate the compatibility of the C++ features you are using.

Greats Things to Do
Once you are comfortable with C++ in the kernel world, you can start unleashing its power. Here are some ideas:

String class - A great way to simplify, lighten, and secure all the code around strings manipulations.

Collections - The collections provided by the kernel may not be enough for you. C++ can help you write powerful collections a la STL.

Secure buffers - By encapsulating buffers in a class, you can reduce the risk of buffer overflows and resources leaks to a minimum.

Pair programming can be extremely productive when working on such delicate matters. And don't forget about the WDF! Don't reinvent the wheel!

Conclusion
C++ is not straightforward in user mode, and kernel mode worsen things. Unfortunately, an exhaustive coverage of all possible issues is not possible in this article. Nonetheless, you do not have to switch violently and instantaneously... You may as well progressively inject more and more C++ into your driver whilst validating that it doesn't alter its behavior. When you are confident with one feature, jump to the next.

Ah, if only I was writing C++, I could have been more concise and efficient... Have you found yourself muttering this mantra? If you have, then it probably means you're ready to give it a try. Even so please, watch your step.

Edouard Alligand masquerades himself as an information security expert with a taste for cryptography and system programming. He lives in Germany with his secret pen and paper based RSA breaking machine. It is of course possible to feed his hungry spam filters at edouard@fausse.info.

Thursday, July 22, 2010

C++ in an NT Driver

No comments:

Post a Comment