
Welcome to a new ongoing series which will be focusing on writing optimized Vault code. In may cases, there are multiple ways to get data from the Vault server, but some ways may be quicker than others depending on your needs.
There are many factors in Vault performance. Network speed, server configuration, available memory, concurrent users, and so on. These articles will not be focusing on any of that. I will be covering only the coding aspect of performance.
This article I will be focusing on what I like to call the Golden Rule of Vault API performance.
| Golden Rule: Minimize the number of Web Service calls. |
You should always follow the Golden Rule... except in cases where you don't. I'll be sure to point these cases out when they come up.
Looking through the API, you probably noticed some functions come in pairs, one that deals with a single object and one that deals with arrays of objects. For this example, I will focus on GetFileById and GetFilesByIds in the Document Service.
|
File GetFileById ( Long fileId );
File [] GetFilesByIds ( Long [] fileIds );
|
The 2 functions do the same thing, they take a file ID and return the File object. But one function is capable of getting multiple objects and the other one can only get a single object.
According to the Golden Rule, you should always use GetFilesByIds since it minimizes the number of API calls. Technically, GetFileById can be removed from the API entirely. It's a bit easier to use if you want to get a single object, but that's the only advantage. Most of these non-array functions are hold-overs from early versions of the API. When new API functions are added, there is usually only the the array version.
I ran some tests using a vault that I loaded with 100,000 files. I then proceeded to get arrays of File objects using the two mechanisms. For example, I would get 100 File objects with 1 GetFilesbyIds call and compare the time with getting the same data with 100 GetFileById calls.
Here is the resulting graph.
The results are what you would expect. Both methods increase in a linear fashion based on the number of Files being returned. However the Single Method is a much steeper line. In my case, it was about 17x steeper and that was with client and server on the same machine.
The upper boundary
The array functions are good for moderate to large data sets. But what about very large data sets? For example, you want 1 million File objects. Problems start to arise when you try to do it all in a single call.
One common problem is the HTTP transfer limit, which is 50 MB. So if your data is larger than 50 MB when it gets transferred, you can't do it in one call. If you go over the limit, you will get an exception from the HTTP or WSE framework.
Another problem is server performance. Don't forget that you application is probably not the only Vault client. There may be others using Vault at the same time and you don't want to use up all the server cycles. Sometimes it's better to deliberately slow down your program to make the system more usable for everyone.
Lastly, there are timeouts in place, both at the web server and at the database level. I believe that our installer recommends 900 seconds (15 minutes) for the web server timeout. The database timeouts are more complex, but most queries have a 6 minute timeout.
For the GetFilesByIds case, I suggest, 10,000 as your upper bound. This number may vary depending on how good or bad your system is. But overall. you should be safe using this number.
Future Articles
This example was pretty basic since it was the first in the series. In future articles, you can look forward to more complex workflows with multiple solutions. I'll try my best to draw from real-world situations, so feel free to leave a comment with something you have had issues with.