Csharp and Excel Interoperability
From MetaSharp
Article Author(s): Audric Thevenet
All Rights Reserved.
Contents |
Introduction
When it comes to Excel extensibility, the usual way to do it is to do some VBA (Visual Basic for Applications) within VBE (Visual Basic Editor). Basically this VBA code can be contained within a sheet, a module, or an external Add-in. VBA has various pro and cons. In 2002, with the apparition of .NET, people began to create some Add-ins in .NET for Excel. .NET Add-ins also have some pro and cons. Let's try to sort it out.
While we will only talk about Excel here, most applies to all other Microsoft Office applications. This page was done with informations grabbed as much as possible directly from the source: the Microsoft .NET office interoperability american team. A very special thanks to Richard Taylor and his enlightenments and to all the other people who seek, like me, how to do some XLA in C#.
Lexical
Let's start by defining clearly the vocabulary we will use.
- Add-in: container of programmed functions and/or classes that Excel can load and use. Excel can understand them if done in VBA or hosting a Com interface.
- Book: see Workbook
- DLL: Add-in (written in .NET here) hosting a Com interface
- Module: in a project, a VBA text area (you can see those in VBE)
- Project: an Excel file (XLS, XLA, ...), it contains 1 workbook and 0 to n worksheets
- Workbook: the group of all the worksheets of a project, it exist through a VBA text area (usually the events related to the project, the menus, toolbars, ...)
- Worksheet: a worksheet is a tab content in Excel, each of those can host some VBA code in VBE
- Reference: a link to another file within a project
- Sheet: see Worksheet
- XLA: Add-in written in VBA (basically an Excel project without any visible worksheet)
- XLS: Excel file containing at least 1 visible worksheet
- UDF: User Defined Function (function callable from a Cell)
- Sub: Function returning no result in VBA, it might be called using the keyword "Call".
- Pub: Function returning a result, if the result is an objet, it must be assigned with a Set instruction.
VBA
To extend an XLS functionnalities with VBA we can add some VBA code in 4 areas within VBE:
- project workbook: all the code related to the project's global events, for example:
- Public Sub Workbook_Open() ' create menus, toolbars, ribbons, ..., initialize some variables, ...
- Public Sub Workbook_BeforeClose(Cancel As Boolean) ' delete those controls and clean what you need
- project worksheets VBA text areas: the code specific to only one worksheet
- project modules: the code used by 2 or more worksheets
- external XLA: the code used by 2 or more XLS
Resources
You will find here some good web resources to code in VBA:
.NET
This VBA introduction sets the context of where we were. But where do we want to go? Well, we either want to create an Excel Add-in in .NET or host an Excel instance within a .NET GUI application depending on the needs. Either way we want to interoperate Excel and .NET (only C# will be discussed here). We will mostly talk about the Add-in as including an excel workbook control in a GUI application is not that different and far less used in the real world anyway.
Interoperability Components
PIA
Each version of Office came out with some PIA (.NET Primary Interop Assemblies) dlls. All Excel Add-ins are layers on top of those PIA.
- 2002:
Office XP PIA
- 2003:
Office 2003 PIA
- 2007:
Office 2007 PIA
Com Add-ins
The first external Add-ins for Microsoft Office were Com Add-ins. They are your only office extension solution if you're working with Excel 2002 XP or earlier versions. By exposing a Com compatible class to Microsoft Office you can instantiate it as an object then call its methods and so on... Those Add-ins are kind of low level, they give you good control on what happens but are not very easy to write. They give you the ability to:
- create functions callable from your VBA code
- create UDF (Use Defined Functions) callable from Excel cells through Automation
- access the Excel object model through IDTExtensibility2 interface
Using Automation to get UDF in Excel cells
Adding the automation to a Com Add-in has only 1 purpose : it enables accessibility to your class methods as if they were procedural functions in Excel cells. A simple automationed Com Add-in might only give you access to standard types for your functions arguments and return types. Automation doesn't grant access to the Excel model (workbooks, worksheets, range, ...). There is no template in Visual Studio for those Add-ins.
Shared Add-In
A shared Add-in is made from a Visual Studio project creating a Com Add-In implementing the IDTExtensibility2 interface. This grants you access to the Excel model (workbooks, worksheets, Range, ...).
This kind of project is still today what looks the most like a C# XLA. They are often used for formal commercial development deployed to external clients. While they are more difficult to make and maintain, they do give you deeper control over the Add-in and can be made somewhat version-agnostic (that is, you can use your own IA instead of a specific Office PIA library, and/or can do more in late binding to handle multiple Office versions with lower risks of compatibility problems).
Using XLL and XLA Wrappers to get UDF Custom Descriptions
Unfortunately, Excel's architecture for Automation Add-ins does not let you add a custom description. Because it simply allows you to call any public Com exported function in an object you select, there is no place to set a description or help for that function like you can do inside of an XLA. So to get that ability, you would either need to use an XLA or XLL wrapper which then calls your managed code:
- an XLA wrapper is something we want to get away from (because we want to code in C# only) so I won't discuss this option further (it is just about creating an XLA exposing 1 function + description for each C# method).
- an XLL wrapper is something you could develop in C++, or you can purchase one that makes the job easier from .NET. While Microsoft cannot recommend any specific 3rd-party product, if you really want this functionality and are willing to use a Microsoft partner developed framework, you might want to take a look at the commercial Add-in Express. It offers an Excel function category dialog where you can set descriptions, help topics, etc... It does this by using their own XLL stub code (in C++) to add the worksheet function, and then their framework stub calls your managed code from the worksheet function. It is an extra layer, but it makes development easier. Doing such an XLL usually targets commercial developments.
Using Com Shims to get Trust and Isolation
The Com Shim Wizard is a tool that creates a native DLL in order to host your managed dll. It offers 2 things that you don't get without a shim:
- without the shim, your dll path is not to your assembly, but to mscoree.dll (the CLR runtime). Mscoree.dll is NOT digitally signed and not considered completely trusted (because it can load any managed code assembly, trusted or not). So if you want to have an Add-in loading when Macro Security in Office is set to Medium or High, you need to add a digital signature that the user accepts as a Trusted Publisher. You cannot sign mscoree.dll, and therefore you need another dll to stand in for mscoree.dll that you can digitally sign and know is trusted because it only loads your managed assembly and no others.
- the 2nd thing it offers is Add-in isolation. Similar to VSTO, the shim can set your code to run in a private AppDomain so that its global state cannot be interfered with by other managed Add-ins that a user may also load into the same host process. Setting up a private AppDomain also helps in loading sub-assemblies or other components out of your Add-in path instead of a global path (like the GAC) if you ship multiple files for your Add-in.
VSTO
Eventually Microsoft added a layer called VSTO (Visual Studio Tools for Office) on top of the PIA to simplify their use. VSTO adds some assistants for Visual Studio that will generate some code for you and some easyness for your deployments. VSTO is mainly used to put .NET controls (menus, buttons, ...) in an XLS file and react to their events. VSTO is currently targeted at enterprise development. That is to say, most of the features/perks of VSTO were designed and developed with an in-house company solution in mind.
VSTO does not provide any way to create UDF (User Defined Functions) out of the box, you need those exposed through Com Automation still. But that doesn't mean you have to create 2 separate projects. VSTO is preferred over shared Add-ins as a general rule. But since VSTO has no method to offer UDF/worksheet functions, you cannot have a purely VSTO-based solution, and therefore the benefits of using VSTO are not as clear cut. VSTO does not require a shim!
- 2003: VSTO 2003 (Office 2003 for Visual Studio 2003)
- 2005:
VSTO 2005 (Office 2003 for Visual Studio 2005) Runtime
- 2007:
VSTO 2005 Second Edition (Office 2007 for Visual Studio 2005) Runtime
- 2008:
VSTO 3.0 (Office 2007 for Visual Studio 2008) Runtime
- 2008:
VSTO 3.0 SP1 (Office 2007 for Visual Studio 2008) Runtime
There are 2 different VSTO project types: document-level workbook customizations, and application-level Add-ins. Although they share a common name, they are in fact very different in how they are implemented. They only work for Office version 2003 or later. In both cases, you do get some enhancements over the Excel PIA, but they still use the Excel PIA and are still dependent upon it.
So, VSTO solutions are generally targeted for a specific Office version to bind to the correct PIA library. That means it is not uncommon to have 2 versions of your project, one for Office 2003 and another for Office 2007. Technically, a project built for Office 2003 should still be able to load and run in Office 2007, but there are natural limitations to that because you still have to build and compile to the older object model. In addition, if you build for Office 2007, you cannot run that in Office 2003 (because 2003 will not support the 2007 object model and PIA).
If you plan to use VSTO and to target multiple versions, see this documentation. VSTO isolates each Add-in into a separate AppDomain in .NET CLR. Since the UDF worksheet functions work by Com interop and not by VSTO, those functions would need to be in a separate class and would not load in same AppDomain as your VSTO code (they can still be in the same assembly, but not in the same class). In a simple sample, this would not mean very much and may be OK.
But where things get tricky is when your worksheet functions need to call objects that were created and being used by the VSTO app-level code. Because .NET objects are private inside separate AppDomains, you could not immediately share such objects or static variables (shared static variables are shared only inside the AppDomain, not outside). So if your worksheet functions are likely to be functionally separated from your command functions, then you could use VSTO here. But if they are connected in any meaningful way, I would suggest you stay with a shared Add-in because that is also a Com interop object and therefore will be inside the same AppDomain as the automation Add-in part of the project (even inside the same class if you want).
VSTO Document-Level Customization
It is designed to work only against a specific workbook (similar to having macros directly inside the XLS file). Those Add-ins replace many of the PIA objects with enhanced objects for data binding and caching support, and therefore do a lot more than their application-level counterpart. VSTO document-level solutions for example only work with Office Pro/Enterprise SKUs, and use deployment schemas that assume an in-house deployment.
VSTO Application-Level Add-In
It is a managed version of a standard Office Com Add-In project. Those Add-ins offer a very thin layer over the native Excel Com Object Model, and therefore are doing little beyond exposing you directly to the Office PIA libraries. VSTO application-level solutions are not bound by these restrictions, but still assume an in-house development as the main goal. It's kind of an enhanced shared Add-in in terms of functionalities.
VSTO Power Tools
Very recently, Microsoft released VSTO Power Tools. It enhances the development by providing some extensions libraries to VSTO and Com Add-ins. The Office interop API Extensions can be used with both Office 2003/2007, though .NET 3.5 is required in both cases:
You can use the library extensions in either VSTO or shared Add-in. They are not dependent on anything in VSTO, rather they are based on a new feature added to the .NET 3.x Framework which VS 2008 provides (they don't work with VS 2005 and 2.0 Framework). The extensions work over the existing PIA, and are not a replacement, so you would have to ship both libraries with your project if you use them. I can't speak to their quality for production products. They are new, and rely on a new feature in Visual Studio .NET 2008 that has not been widely used. But the tools were created and supported by developers on the VSTO product team, so I'm sure they took pains to make sure it suitable to use in a production environment.
Installation on the Developer Side
VSTO is only shipped in specific Visual Studio versions:
By the way, if you have any paying version not including VSTO in standard, you can grab here the latest "light" version of VSTO for free (it does not include all templates, etc...) and its updates:
2006-11-05 VSTO 2005 Second Edition (Office 2007 for Visual Studio 2005)
2005-10-11 intellisense xml files
2006-01-16 intellisense code snippets
2007-07-18 update
If you have Visual Studio 2005, you will also need to install this patch on your development computer to get your shared Add-ins working after deployment on other computers:
Detailled Topics
Com Add-In or VSTO?
To replicate a XLA behavior, you can use a shared Add-In project for loading and handling application level tasks (like menus, toolbars, and events), and then also expose the class through Com interop to register it as an automation Add-In (which lets you call public methods in the class from a cell as a worksheet function). This would get you most of the functionality you have in an XLA from purely managed code.
Before we go further, however, I need to add a word of caution here. Custom worksheet functions are typically made in VBA (as XLA) or in C++ (as XLL), where they are designed to run quickly and return results without the risk of reentrancy. Since these types of functions are called during recalculations, and occur frequently, performance and control over thread context switching are key factors in why Microsoft still recommend either XLA or XLL project types for calculations. Managed code (in contrast) has to go through 2 layers of data marshalling each time the function is called, and given the CLR design the function can be called on worker threads which can cause unwanted thread context switches, which can lead to reentrancy and errors in rare cases. For these reasons (performance and stability), Microsoft still recommends the older project types (XLA or XLL wrapper) for worksheet functions. While you can create a project that lets you call C# functions as worksheet functions, you should be aware of the limitations that it has upfront, and Microsoft does not recommend this approach for mission-critical applications.
Here is how you can accomplish this using VS 2005 (or 2008). You start by creating a shared Add-In project in Visual Studio. The sample shows how to add a CommandBar and button which you can use to handle custom actions. To sink application level events, I would suggest you use a late bound approach because there are known problems with .NET delegate event sinks and Excel's OLE capabilities. Take a look at this web posting, and the xlEvents_Sample.zip project that you can download at the bottom.
The compatibility issue between delegates in .NET and OLE is not around events per se but in .NET's handling of OLE objects passed into the events. Because of .NET garbage collection, it can keep references to OLE objects that have otherwise been told to close and those references keep it from closing. This has bad side effects which are documented in the web post, but are "expected" based on the design of the .NET Framework. So there is nothing that can automatically fix that sort of issue, you have to be mindful that Excel can be used for OLE and may call events for OLE objects, and to avoid those side effects you need to manually release the reference count that .NET holds on them. That problem is not specific to Shared Add-ins, it applies to any managed code running in process to Excel while Excel is used for OLE.
To add worksheet functions, you need to make those functions public and Com Visible, then you can register the assembly type library with Excel as an automation Add-In. Automation Add-ins are Com components whose public members can be called from cell functions similar to public XLA functions. For example, see one of the following code samples:
As you can tell, you will need to craft something custom if you want to get all your code into 1 assembly. Naturally, another approach that might be less work would be to use 2 projects: a .NET assembly core component and an XLA stub which simply calls the managed core component when needed. If language was not an issue, a C++ XLL project would be the best option. The new Excel 2007 XLL SDK could help you there, but it is C/C++ only. So we won't discuss this option here further either.
Leaning toward VSTO seems more logical. If it is the case for you too, I would suggest you stick with an XLA or XLL wrapper for the worksheet function capability rather than doing an automation Add-in. If you go with the Automation Add-in, I think a shared Add-in will end up being a better compliment piece than VSTO. So I guess it just depends on how much you want to move away from XLA in this next release. Assuming UDF features are added to VSTO in the future, it might be better to go that direction. But I cannot say when/if that happen exactly, so I don't want to sell you on a choice that is not practical in the short term.
Create an UDF
If you want to use UDF worksheet functions directly from C# (without using an XLA wrapper) in VSTO, then you will likely still need a shim for that component, because Excel does not load automation Add-ins by VSTO, it loads them by Com. So even though it may exists in the same DLL, Excel thinks of them as different Add-ins and treats them separately. If you decided to stick with an XLA wrapper (or go with your own or a custom 3rd-party XLL wrapper), you would not need the shim, since the macro security would apply to the wrapper and not the library called.
To do this you need to follow these steps:
- make the function you want to call public and compile the assembly with the Register for Com Interop setting checked (should already be set if this is a shared Add-in project)
- You need to add a "Programmable" subkey in the registry for your assembly's Com registration (this can be done by code, look at the ComRegisterFunction Attribute you can use to run code during Com Interop registration/unregistration)
- you need to add the Add-in in Excel using the Tools | Add-ins dialog, you press the Automation button, then in the list you find and select your managed class (it is typically named something like AssemblyName.ClassName in the list).
- Once It is added, it will appear in the Add-ins dialog.
- Press Ok to exit the dialog and the public methods of your class are now callable from the worksheet.
Automatic Registration
There are 2 ways you can register an Automation Add-in to be loaded on startup in Excel:
- you can run custom code from your setup to add an "OPEN[x]" string key under the user's Excel Options (HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Options), where the string setting is in the following format: "/A "ProgID_of_Automation_UDF_Add-in"". Because there may be more than one UDF Add-in set to load on startup, the OPEN keys are numbered, so you cannot use a fixed/static registry setting; you will need to use custom setup code to loop through existing settings add a new one when an empty slot is found. Once the key is set, Excel will automatically load the UDF Add-in on startup.
- the 2nd approach is to add it dynamically from inside Excel from VBA or shared Add-in or VSTO, etc... To do that, you call Application.Add-ins.Add("ProgID"), then set Add-in.Installed = True from the newly added Add-in object.
Calling a Method as if it was an XLA function
An object has to be instantiated somewhere for you to call a function in it. Exporting static functions and calling them from VBA using a Declare statement is not directly supported by C#. Though you can access an already running C# object. With a shared Add-In, you can set a pointer to yourself in the ComAdd-in object of the IDTExtensibility2::OnConnection callback. That way a VBA code caller can do something like this: Application.ComAdd-ins(1).Object.DoSomething(MyArgument). If you are using VSTO, you have no direct connection to your object from VBA. So you have to use a different approach (maybe you are already doing that?) – such as creating a VBA subroutine that you can call from C# (via Excel's Application.Run method) that passes a pointer to your class and keeps a reference of it in VBA.
Speeding up Excel C# interop
Once the code is JIT compiled on the client, it will run relatively fast. The bottlenecks are the calls to the function itself or any internal calls you might make back to Excel through the PIA. Those are Com calls which .NET has to marshal in/out of native to managed data types. As long as you use managed code, you cannot really avoid that marshal hit, so the trick is to minimize that as much as possible and do as much as you can in managed code once called before returning back to Excel.
Add-In Deployment
Fixing Shared Add-In Issues
If you are using .NET 2.0 Framework, then you will need to distribute a patch for Office 2003 clients. The patch you want to download is KB908002. This is a Visual Studio download which provides an additional pre-requisite install option when making your setups in Visual Studio. The pre-requisite option is called "Shared Add-In Support Update for the Microsoft .NET Framework 2.0". When you add this to your MSI setup options, it will automatically install the same fix as KB907417, plus it will install two other MSI packages needed for shared Add-ins. So this is package you want to use (don't forget to deliver them, they are in the generated sub directories). There are steps on how to build your setup using this patch in the article.
Msi versus Exe
You should provide both. The exe is needed if:
- the client does not have MSI installed, or needs to update their MSI version by prompting the user to download and install the current MSI setup from the Microsoft web site
- on Windows Vista clients to elevate permissions if your setup needs to install files/registry keys in any protected system-wide location (like the Windows folder, or the HKEY_LOCAL_MACHINE hive, etc.). Permission elevation has to be done by an EXE at launch time, so it cannot be done from an *.msi file itself. Other than that, you could install by the *.msi package alone.
Installation in User Mode
You can do that, provided you install the program to the Program Files folder, you don't need to change system files, and you don't register anything under the HKLM hive. This last requirement means the Add-in would need to be set up per-user instead of per-machine. You normally pick that option from the shared Add-ins wizard when you first create the project, but you can manually change that if needed. If you want the Add-in to be installed once for all users, then you will need admin rights to install the setup.
ClickOnce
ClickOnce does not work for Office 2003 Add-ins, only Office 2007 VSTO projects which use the .NET 3.5 Framework. I also do not think you can use ClickOnce if you go with an Automation add-in part for the UDF/worksheet functions. The key there is that ClickOnce only supports registry-less Com objects for managed applications that deploy their own manifest. However, you are an Add-in loading in someone else's process space, so you do not control the manifest file for Excel itself, and you cannot rely on registry-less Com. The component has to be added to the registry for Excel to treat it as an automation Add-in, and so you will have to use MSI package if you want that. If you use an XLA/XLL wrapper instead of an Automation Add-in, you might be able to use ClickOnce, but again it is only supported in VSTO for Office 2007.
Resources
The Excel C# Add-in Sample
Here will be a minimalist Excel C# Add-in sample.
under construction : under investigation
What about VSTA?
In regards about managed C# programming inside an Office application, there is a technology (developed by the former VBA team) that acts similar to VBA but is entire .NET based. The VS2005 version of the technology is publically documented and can be used by 3rd-parties in addition to Microsoft products. It is called VSTA (Visual Studio Tools for Applications), and you can think of it as a more VBA-like version of VSTO (check here). VSTA is currently available in InfoPath 2007 for use in forms programming (instead of VBA). Although this technology is similar in many ways to VBA, you should also know that VBA will continue to be available in Office for another two versions. So VSTA is not a replacement for VBA as such, it is something that runs side-by-side. All of this said, I cannot tell you whether Excel will incorporate VSTA in Office 14 (or beyond). It is fair to say, they are interested in giving programmers more .NET capabilities inside the product, but at this time I cannot say what the exact course will be or when it will occur. VSTA is not available in Excel currently.
