October 3, 2022

Robotic Notes

All technology News

Java String intern(): Performance impact – Java Code Geeks

4 min read


java.lang.String#intern() is an interesting function in Java. When used in the right place, it has the potential to reduce the overall memory consumption of your application by eliminating duplicate strings in your application. To learn how the intern() function works, you can refer to this blog. In this post, let’s discuss the performance impact of using the java.lang.String#intern() function in your application.

Demonstration of intern() function.

To study the performance behavior of the intern() method, we created these two simple programs:

public class InternDemo 

   private static List<String> datas = new ArrayList<>(10_000_000);
   
   public static void main(String args[]) throws Exception 
   
      BufferedReader reader = new BufferedReader(new FileReader("C:\\workspace\\random-data.txt"));
      String data = reader.readLine();
      while (data != null) 
         data = reader.readLine().intern();
         datas.add(data);
      
      reader.close();   
   
public class NoInternDemo 

   private static List<String> datas = new ArrayList<>(10_000_000);
   
   public static void main(String args[]) throws Exception 
   
      BufferedReader reader = new BufferedReader(new FileReader("C:\\workspace\\random-data.txt"));
      String data = reader.readLine();
      while (data != null) 
         data = reader.readLine();
         datas.add(data);
      
      reader.close();   
   

Please review the above source code before reading further. This is a simple program. If you notice “InternDemo” the program reads each line at a time from “random data.txt‘ & then calls an intern() operation on the read data. The string returned by the intern() function is then appended to the ‘give’ ArrayList. “NoInternDemo” the program also does exactly the same thing, the only difference is “NoInternDemo” does not shout ‘trainee()“operation and”InternDemo“calls out”intern()‘ operation.

You should also understand the contents of the “random data.txt‘. This file contains 10 million UUID (Universally Unique Identifier) ​​strings. Although there are 10 million UUID strings in this file, there is a significant amount of duplication among them. Basically, there are only 10 unique UUIDs that are inserted 10 million times in this file. The data is intentionally structured in such a way that there are a large number of duplicate strings in this file. You can download ‘random-data.txt’ file we used for this experiment from this location.

Memory impact

We ran both programs. Before the programs came out, we caught the memory dump from them. A heap dump is basically a snapshot of memory that contains information about all objects that reside in memory. We investigated the memory dump using HeapHero – a memory dump analysis tool. Here are the live reports generated by this tool:

a. InternDemo Heap Analysis Report

b. NoInternDemo Heap Analysis Report

The table below summarizes the differences between the two programs:

InternDemo NoInternDemo
The total size 38.37MB 1.08GB
Number of objects 4,184 20,004,164
Number of classes 456 456

You can spot this “InternDemo” there are only 4k+ objects consuming only 38.37 MB whereas “NoInternDemo” there are 20 million+ objects consuming 1.08 GB of memory. Generally “NoInternDemo” consumes 28 times more memory than “InternDemo”. This demo clearly illustrates that memory optimization is achieved by intern() function.

Impact of duplicate strings

The HeapHero tool shows in its report how much memory is wasted due to inefficient programming practices. We noticed that “InternDemo” does not lose any memory while “NoInternDemo” wastes 1.04 gb (ie 97%) of memory due to inefficient programming practices. At this 97% memory loss, 96.5% of losses occur due to duplicate strings.

Figure: InternDemo memory leak reported by HeapHero

The tool also indicates that duplicate strings that are present in memory follow. Basically, these 10 strings are the 10 unique UUIDs that were present in the random-data.txt file. Due to duplication, each string was wasting 106mb of memory. If intern() operation was used, then this kind of memory loss could have been avoided.

Response Time Impact

We did both “InternDemo” and ‘NoInternDemo’ programs several times. The graph below shows the average response time of these two programs:

InternDemo NoInternDemo
2042 ms 1164 ms

generally ‘InternDemo’ was 75% slower than “There is no InternDemo‘. It’s just because “InternDemo” must call the string.equals() method on all objects in the internal String pool for the 10 million records. Therefore, it consumes more CPU and time. Thus, the response time ofInternDemo’ is slower than “NoInternDemo‘. No wonder people say:There is no free lunch’. ‘InternDemo’ it was very productive in terms of memory. “NoInternDemo” was high performance in terms of CPU/response time.

NOTE: The performance impact of the intern() function is highly data dependent that your application processes. In the example above, there were a large number of duplicate strings, which is why you saw such a large drop in memory usage and a spike in response time. This may not be the same behavior in all applications. You should perform appropriate testing before using the intern() function in your application.



Source link