OutOfMemoryError Follow Up

After spending a great deal of time testing, looking… showering…. I finally managed to locate an error that was causing the problem, but this requires knowledge of how the JVM works in order to fully understand.  At its core, however, is what I see as a bug in the domino API. (NOT ODA!)

Just to quickly go through the environment: we are dealing with a domino 9.0.1 server, running the ExtLibs, as well as ODA. The application in question is only using the ExtLibs, although ODA is installed onto the server. It is not listed as a project dependency.  ExtLibs is used only to get the current database. A fixpack round about 42-ish is being used, I do not have the number memorized.

To reproduce the problem, I created a database with only two xpages, and only to methods.  The first method created 120,000 documents.  Each document had only two fields that were manually set: Form and Status.  To set the status, I used creatingDocNum % 3 to makes sure that a third of all created documents had the same value. We should have 40,000 documents with the status set to “0” and so on.

The next XPage executed a search over these documents looking for all documents with that form name and the status “0”.  As stated, there would have to be 40,000 hits in the database. When performing a lotus.domino.database.search(String, Date, Integer), we are returned a lotus.domino.DocumentCollection.  The getCount() returns 500, (I used 500 as the maximum document count). When iterating over the documents, I put the universal id (toUpper()) in a hashmap, as well as counted the iterations. After each loop, I requested how much memory was remaining.  Once a certain minimum value was reached, I jumped out of the iteration loop.  I printed the hashmap size, the iteration count, and the value returned by the getCount() of the collection object.  I was well over the desired 500 document count (anywhere between 1500 and 6000 depending on the memory available) and the getCount() always returned 500.  A PMR has been opened for this case.

My work-around is two-pronged.  The first bit is easy.  I simply jump out of the iteration when enough documents have been iterated over. The second bit is that I constantly check how much memory is free.  Once I hit a minimum, I also jump ship. The appropriate message is displayed to the user and he can refine the search, or try again later.

But this is sadly not enough for my ‘want-to-know-everything’ attitude (though in reality I can never know enough), during my testing I found that the available memory always was set back down to 64 MB….

Here is the point where JVM knowledge is paramount. The Runtime always wants to make the smallest memory footprint possible. To that end, when garbage collection is performed, the amount of memory available to the runtime is recalculated.  If the available memory is small enough, it will allocate more memory up to the maximum configured value. If there is a bit of free memory, then it will lower the available memory. All well and good…. Normally… What if we have a few users who go in and start a massive search at the same time… because that is what they are there for. Call up their data and have a good day. We could enter a situation where that 64 MB RAM is just not going to cut it. Furthermore, because these massive calls are happening simultaneously, we just entered a situation where the runtime is not going to allocate enough memory fast enough. Even though we set the ini to use a maximum memory of 512MB, we are getting an OutOfMemoryError at only 64MB.

Enter the XPages Gods who have not only mastered development but are more than their hips deep into domino administration…..  (in other words google and some awesome blogs…)

LET ME SAY THIS WITH EXTREME CAUTION!!!

Setting HTTPJVMMaxHeapSize=512M is not enough.
Setting JVMMinHeapSize= 128M may be necessary.

I am always very careful before saying that we need to allocate more memory. This is because of how domino works. I go through a checklist to verify the following:

  1. We are not throwing more memory at a memory leak. (are we using recycle() appropriately and correctly?)
  2. How many XPage applications are using on the server (each App runs a new JVM [normally])
  3. The server host can handle it, i.e. enough RAM is physically installed onto the machine.
  4. The problem is not limited to a situation that can be fixed another way that also makes sense.

As a side note, I have found this this error occurs whether or not the openNTF Domino API is used or not. Naturally I have spent more time reproducing the error for IBM with their API than with ODA.

So there we have it. A nice little bug that has been handed over to the guy with a fly-swatter. Happy Programming!

EDIT

The OutOfMemoryErrors were a result of processing the documents and putting fields of the documents into java objects that were then stored in a List in the view or session scope. The OutOfMemoryError was not a direct result of performing the search, but rather caused by the bug: the search delivers a DocumentCollection object that has more documents than it should and the getCount() method that returns the desired result, not the amount of documents that are actually in the collection.

Memory: A little fucker that dies before its time AKA Crap-Filled bowling balls

As anyone who knows me can tell, I take great pride in every application that I work with.  Since spearheading my company’s XPage development starting in 2010(ish), I have developed, analyzed, and fixed numerous apps.  They are like little children that send off into the real world.

So when I get reports that one of them is misbehaving, I get real defensive, real fast. It is, unfortunately, my downfall. However, every app that I need to re-evaluate is a learning potential and I do treat it as such.  *Spanks the bad app with a vegance*  (Ok, not really)

Bad jokes and terrible ideas later, I will get into the issue at hand.

Lets call this app ‘Waterfall Workflow’ or WWF. WWF is an application where I can expect a peak of about 800 users concurrently at its absolute extreme maximum. Normal operations should be about half of that number. Users sign in to a main application which holds no more than configuration information and which is responsible for the XPages and coding. Coding is done primarily in Java. All code uses ODA, or as it is officially named, the openNTF Domino API, or just THE API!!! (It all depends on who is speaking 😛 ) Hi Paul, David, and Nathan!!! It also makes heavy use of the Extension Libraries, but lets forget that for a moment.

The data is contained on about 4 separate .nsf databases. Each database has a specific function, i.e. Labels and Languages, Primary Data, Global configurations and database instances, etc.  Because I do not want to build a database connection for every little piece of the puzzle, I lazily add every piece of configuration heaven into a cache in the application scope.  This is done through a series of ‘Controller’ type java classes.  No worries, I do not load every possible piece of scrap into its own AS variable.  Everything is neatly organized!!!  (Your pride is showing….  oops *zip*) The primary data is obviously not cached…. why should it be….

So all is fine and dandy until I decide to build an advanced search. Should be easy, right???  Yeah, why not. So lets look at my solution and take a look at some more specifics of WWF.

  1. We are dealing with an application with heavy Author/Reader field usage.  (well there goes performance, but there is not too much I can do there…. I think…)
  2. We are dealing with approximately 60,000 documents worth of primary data. (Remember other information is stored in cache as fetched from other nsfs)
  3. Each primary data document may hold a single attachment, and every primary data document may have a separate document (linked via a key) containing up to 10 attachments. This gives us a max total of about 120,000 possible documents where the actual value is likely closer to roughly 80,000.
  4. The search is done in such a way that the query could have 20,000 hits or more. (theoretical)

Productive Server Info

  • 2 XPage applications are run on this server.
  • we are dealing with 64 bit Windows and 64 bit Domino 9.0.1 FP3 and some odd-numbered hot-fix pack
  • October 2015 ExtLibs, and ODA

The implementation of the advanced search is a pretty simple. A user gets the possibility to select values for up to 10 or so fields contained in the primary data document.  (There are a total of about 120 odd fields per document) Dependent upon the users’ selection, a DbSearch is performed after building a query string.  Although I cannot remember why, I know that a full text index of the database was not built, and one is not desired.  The DbSearch is set to return a maximum of 500 documents. Depending on the selection of the user, a further search is performed on an archive which contains old data.

As previously stated, all actions performed using domino data is performed using THE API (ODA). This of course includes the search. Once the search delivers the document collection, an iterator is created over which the documents are read out one by one into java objects which is then stored in a list.  These java objects contain roughly 15 string attributes, and we are talking about a maximum of 1000 returned documents. (2 searches, each returning 500 documents)  This is nothing ground breaking. This list is stored in a session scoped controller (so that the view can be closed and re-opened without performing a search a second time). We found no issues testing with up to 10 people in the testing environment.  We let this functionality go live, and BAM!!!!!!!!  OutOfMemoryErrors hit us (ok, hit me) like a ton of crap-filled bowling balls and I still cannot get the stench off of me. Design restore. Wash. Rinse. Rethink…..

Since the design update included numerous little changes, I first had to localize the problem.  JMeter to the rescue in our confined QA environment which (as far as I can tell) is a 1 to 1 mock up of the final server. Same OS, same hardware specs, same (at least where it counts) config.  Same OSGi Plug-ins.

After setting up a test plan where x dummy users login, go to the search page, submit a search request via AJAX, I thought it would be a good idea to set x to 100 users. (All of which are using the same credentials, but by checking the cookies they are all on their own individual sessions) No more than 10 search requests were submitted before BAM!!!!!  Another ton of crap-filled bowling balls.  Server restart, Wash, Rinse, Repeat.

So, where am I going wrong then?

I quickly build another app in the QA system containing one xpage, no configuration cache, and only a DbSearch and a dummy java object being saved in the session scope. So far, only ODA was tested, and the same function construction was emulated. (obviously without the extra final version finesse) Same problem.  Next step, find out which step in the code is causing the error, and by the way, lets cut it to a simple 5 or 10 dummy users.

Before I go further, I want to explain the princess that is the JVM. She has maximum memory (this is how big her brain is), she has an available memory (how much she is willing to give you at the moment, but she’ll give you more if you need it), and a used memory (how much she is actually thinking about your lovely self). Lets expand this into the domino world, and we have two notes.ini variables with which we can play. HTTPJVMMaxHeapSize and its buddy HTTPJVMHeapSizeSet (or whatever). On a 64 bit system, you can play with this a bit. Its default is 256M, referring to a total maximum runtime memory of 256MB, and its buddy, when set to 1 (as far as I know) tells domino not to reset the max heap size.  Don’t quote me on that though, it has been a while since reading Paul Wither’s awesome XPage book.

After every critical call, and after every 100th document being iterated over, I printed:

  1. free memory calculated to MB
  2. total available memory calculated to MB
  3. maximum memory

From beginning on, I only had about 35 MB of memory, 64MB available, and a total of 256MB. I played with the setting, going up to 512MB, and then a total of 1024MB. I found a few interesting things:

  1. Viewing the task manager resource/performance panel, the memory usage on the server never exceeded roughly 4 GB of the available 16 GB RAM.
  2. The available memory never exceeded 64MB
  3. the free memory (ok, i obviously was not seeing every milli-second’s value), never went below 5MB.
  4. On the server console, the iteration looping continued although I was also reading the bloody OutOfMemoryError crap-filled bowling ball message.

I am left with an interesting challenge.  What is the cause of this stupid bowling-ball shower? The following thoughts are going through my head…

  1. Is a domino configuration setting messing with me, and is that why the available memory is not increasing to match my current needs?
  2. Am I doing something wrong with the loop?
  3. Is it possible that the problem is not with me, but with my tools?
  4. Is it possible that ODA cannot recycle the objects fast enough to handle 10 concurrent requests to perform a function which does a dbsearch over approximately 80,000 documents and returns a maximum of 500?
  5. Is it possible that the OSGi Runtimes are not getting the memory they need to run?  If not, why would that not take the same value as is written in the notes.ini?
  6. What the fuck am I missing?
  7. How do I get the smell of crap-filled bowling ball of me?  Does tomato juice work?

As you can tell, I am still trying to figure this out. I don’t expect you to have learned anything from this, but at least I got my thoughts out.

I am going to try taking yet another shower.