As anyone who knows me can tell, I take great pride in every application that I work with. Since spearheading my company’s XPage development starting in 2010(ish), I have developed, analyzed, and fixed numerous apps. They are like little children that send off into the real world.
So when I get reports that one of them is misbehaving, I get real defensive, real fast. It is, unfortunately, my downfall. However, every app that I need to re-evaluate is a learning potential and I do treat it as such. *Spanks the bad app with a vegance* (Ok, not really)
Bad jokes and terrible ideas later, I will get into the issue at hand.
Lets call this app ‘Waterfall Workflow’ or WWF. WWF is an application where I can expect a peak of about 800 users concurrently at its absolute extreme maximum. Normal operations should be about half of that number. Users sign in to a main application which holds no more than configuration information and which is responsible for the XPages and coding. Coding is done primarily in Java. All code uses ODA, or as it is officially named, the openNTF Domino API, or just THE API!!! (It all depends on who is speaking 😛 ) Hi Paul, David, and Nathan!!! It also makes heavy use of the Extension Libraries, but lets forget that for a moment.
The data is contained on about 4 separate .nsf databases. Each database has a specific function, i.e. Labels and Languages, Primary Data, Global configurations and database instances, etc. Because I do not want to build a database connection for every little piece of the puzzle, I lazily add every piece of configuration heaven into a cache in the application scope. This is done through a series of ‘Controller’ type java classes. No worries, I do not load every possible piece of scrap into its own AS variable. Everything is neatly organized!!! (Your pride is showing…. oops *zip*) The primary data is obviously not cached…. why should it be….
So all is fine and dandy until I decide to build an advanced search. Should be easy, right??? Yeah, why not. So lets look at my solution and take a look at some more specifics of WWF.
- We are dealing with an application with heavy Author/Reader field usage. (well there goes performance, but there is not too much I can do there…. I think…)
- We are dealing with approximately 60,000 documents worth of primary data. (Remember other information is stored in cache as fetched from other nsfs)
- Each primary data document may hold a single attachment, and every primary data document may have a separate document (linked via a key) containing up to 10 attachments. This gives us a max total of about 120,000 possible documents where the actual value is likely closer to roughly 80,000.
- The search is done in such a way that the query could have 20,000 hits or more. (theoretical)
Productive Server Info
- 2 XPage applications are run on this server.
- we are dealing with 64 bit Windows and 64 bit Domino 9.0.1 FP3 and some odd-numbered hot-fix pack
- October 2015 ExtLibs, and ODA
The implementation of the advanced search is a pretty simple. A user gets the possibility to select values for up to 10 or so fields contained in the primary data document. (There are a total of about 120 odd fields per document) Dependent upon the users’ selection, a DbSearch is performed after building a query string. Although I cannot remember why, I know that a full text index of the database was not built, and one is not desired. The DbSearch is set to return a maximum of 500 documents. Depending on the selection of the user, a further search is performed on an archive which contains old data.
As previously stated, all actions performed using domino data is performed using THE API (ODA). This of course includes the search. Once the search delivers the document collection, an iterator is created over which the documents are read out one by one into java objects which is then stored in a list. These java objects contain roughly 15 string attributes, and we are talking about a maximum of 1000 returned documents. (2 searches, each returning 500 documents) This is nothing ground breaking. This list is stored in a session scoped controller (so that the view can be closed and re-opened without performing a search a second time). We found no issues testing with up to 10 people in the testing environment. We let this functionality go live, and BAM!!!!!!!! OutOfMemoryErrors hit us (ok, hit me) like a ton of crap-filled bowling balls and I still cannot get the stench off of me. Design restore. Wash. Rinse. Rethink…..
Since the design update included numerous little changes, I first had to localize the problem. JMeter to the rescue in our confined QA environment which (as far as I can tell) is a 1 to 1 mock up of the final server. Same OS, same hardware specs, same (at least where it counts) config. Same OSGi Plug-ins.
After setting up a test plan where x dummy users login, go to the search page, submit a search request via AJAX, I thought it would be a good idea to set x to 100 users. (All of which are using the same credentials, but by checking the cookies they are all on their own individual sessions) No more than 10 search requests were submitted before BAM!!!!! Another ton of crap-filled bowling balls. Server restart, Wash, Rinse, Repeat.
So, where am I going wrong then?
I quickly build another app in the QA system containing one xpage, no configuration cache, and only a DbSearch and a dummy java object being saved in the session scope. So far, only ODA was tested, and the same function construction was emulated. (obviously without the extra final version finesse) Same problem. Next step, find out which step in the code is causing the error, and by the way, lets cut it to a simple 5 or 10 dummy users.
Before I go further, I want to explain the princess that is the JVM. She has maximum memory (this is how big her brain is), she has an available memory (how much she is willing to give you at the moment, but she’ll give you more if you need it), and a used memory (how much she is actually thinking about your lovely self). Lets expand this into the domino world, and we have two notes.ini variables with which we can play. HTTPJVMMaxHeapSize and its buddy HTTPJVMHeapSizeSet (or whatever). On a 64 bit system, you can play with this a bit. Its default is 256M, referring to a total maximum runtime memory of 256MB, and its buddy, when set to 1 (as far as I know) tells domino not to reset the max heap size. Don’t quote me on that though, it has been a while since reading Paul Wither’s awesome XPage book.
After every critical call, and after every 100th document being iterated over, I printed:
- free memory calculated to MB
- total available memory calculated to MB
- maximum memory
From beginning on, I only had about 35 MB of memory, 64MB available, and a total of 256MB. I played with the setting, going up to 512MB, and then a total of 1024MB. I found a few interesting things:
- Viewing the task manager resource/performance panel, the memory usage on the server never exceeded roughly 4 GB of the available 16 GB RAM.
- The available memory never exceeded 64MB
- the free memory (ok, i obviously was not seeing every milli-second’s value), never went below 5MB.
- On the server console, the iteration looping continued although I was also reading the bloody OutOfMemoryError crap-filled bowling ball message.
I am left with an interesting challenge. What is the cause of this stupid bowling-ball shower? The following thoughts are going through my head…
- Is a domino configuration setting messing with me, and is that why the available memory is not increasing to match my current needs?
- Am I doing something wrong with the loop?
- Is it possible that the problem is not with me, but with my tools?
- Is it possible that ODA cannot recycle the objects fast enough to handle 10 concurrent requests to perform a function which does a dbsearch over approximately 80,000 documents and returns a maximum of 500?
- Is it possible that the OSGi Runtimes are not getting the memory they need to run? If not, why would that not take the same value as is written in the notes.ini?
- What the fuck am I missing?
- How do I get the smell of crap-filled bowling ball of me? Does tomato juice work?
As you can tell, I am still trying to figure this out. I don’t expect you to have learned anything from this, but at least I got my thoughts out.
I am going to try taking yet another shower.