Geek's Pearls: 2016

Wednesday, August 24, 2016

Configure JProfiler 9.2 to profiling applications running in Docker containers

Recently, I worked on a task to address a memory issue in our applications. And I was using JProfiler 9.2 to analyze the memory usage. I run our applications in Docker containers, so I have to attach JProfiler to remote JVM to do the profiling. Below is a step by step guide on how to make JProfiler 9.2 working with Docker. P.S. I'm using a Linux system.

These steps are to be done in Docker containers:

1. Download JProfiler 9.2 in Docker image and expose port 8849 by adding the following lines in the Dockerfile file and rebuild the Docker image.

RUN wget http://download-keycdn.ej-technologies.com/jprofiler/jprofiler_linux_9_2.tar.gz -P /tmp/ &&\
 tar -xzf /tmp/jprofiler_linux_9_2.tar.gz -C /usr/local &&\
 rm /tmp/jprofiler_linux_9_2.tar.gz

ENV JPAGENT_PATH="-agentpath:/usr/local/jprofiler9/bin/linux-x64/libjprofilerti.so=nowait"
EXPOSE 8849

2. Start the Docker container.

As Will Humphreys's comments below. Start your docker container with port 8849 mapped to your host's port 8849.
docker run -p 8849:8849 imageName

If docker compose is in use. Map port 8849 to the host port 8849 by adding "8849:8849" to the ports section in the docker-compose file.

ports:
- "8849:8849"

3. Get inside the Docker container by running the command below.

docker exec -it [container-name] bash

4. Start attach mode of JProfiler in the Docker container by running these commands inside the docker container.

cd /usr/local/jrofiler9/
bin/jpenable

JProfiler should promote you to enter the mode and the port. Enter '1' and '8849' as shown in the screen shot below.

Then you should see the JProfiler log information in your application server's log. See example screen shot below.

Alternatively, if you want to enable JProfiler agent at your web server start up and wait for JProfiler GUI connecting from host, instead of putting "ENV JPAGENT_PATH="-agentpath:/usr/local/jprofiler9/bin/linux-x64/libjprofilerti.so=nowait"" in the Dockerfile. Add following line to the JAVA_OPTS. For tomcat, it will be CATALINA_OPTS. Note: the config.xml will be the place to put your JProfiler license key.

JAVA_OPTS="$JAVA_OPTS -agentpath:/usr/local/jprofiler9/bin/linux-x64/libjprofilerti.so=port=8849,wait,config=/usr/local/jprofiler9/config.xml"

Now you are done at the docker container side. The container is ready to be attached to
your JProfiler GUI. The steps below are to be done on the host machine.

1. Download JProfiler 9.2 from https://www.ej-technologies.com/download/jprofiler/files and install it.
2. Open JProfiler and open a new session by press Ctrl + N or Click 'New Session' in Session menu.
3. Select 'Attach to profiled JVM (local or remote)' in Session Type section. Enter the IP address and 8849 as profiling port in Profiled JVM Settings section. Leave the other settings as default. Then click OK.

If you don't know the IP address of the Docker container, go inside it and type 'ifconfig'. If 'ifconfig' is not found, install it by 'yum -y install net-tools' for centOS system. Or whatever command for the other systems.

4. A Session Startup window should be shown, leave all default settings and click OK.

JProfiler should start to transform classes and connect to your JVM in the Docker container.

Once it finishes the connecting process, you should be able to see the profiling charts showing up.

PS. If you have a license key, the way to enter it to the JProfiler inside docker container is opening $JPROFILER_HOME/config.xml, and insert your key there as below. If config.xml is not existing, copy it from $HOME/.jprofiler9 on your host machine.

...

Friday, July 8, 2016

Graph - Introduction

In this post, I would like to give a simple description about a data structure - graph. There are several useful algorithms on graph and I will talk about them later.

Firstly, what is a graph?

The following two figures shows two simple graphs.

Graph 1

Graph 2

You may notice the difference between the above two graphs. Let's see the formally definition.

Graph
A graph G = (V, E) consist of a finite set of vertices (or nodes) V= {$v_1$, $v_2$, ..., $v_n$} and a set of edges E. The graph 1 in above figures is called undirected graph and each edge in its E is an unordered pair of vertices. The graph 2 is called directed graph and each edge in its E is an ordered pair of vertices.

An undirected graph is said to be complete if there is an edge between each pair of its vertices. A directed graph is said to be complete if there is an edge from each vertex to all other vertices.

Are the above two graphs complete? Yes to the undirected graph and no to the directed graph.

Representation of graphs

There are two commonly used data structures to represent a graph.

Adjacency matrix
Adjacency matrix M of a graph G is a boolean matrix which M[i, j] = 1 if and only if ($v_i, v_j$) is an edge in G.

Adjacency list
Adjacency list is a collection of linked list, each list represent the vertices adjacent to
a vertex.

The figures below show two representations of an undirected graph and a directed graph.

Undirected graph

Directed graph

A JAVA implementation of the above graphs can be found on GitHub - Adjacency Matrix and Adjacency List.

Saturday, June 25, 2016

How to package a Python module

Python is an interesting coding language. It can be used to implement the solution for some simple tasks in very short time. There are many useful modules out there you can use for your task. Below is a simple way to package your Python application as a module. Then you can distribute it and it can be used by others.

It is recommended to organize your Python application in the following project structure

my-project

---mypackge

---__init__.py

---__main__.py

---package-data

---package.conf

---package.txt

---mainscript.py

---mypackge-runner.py

---setup.py

Below is a very simple example of the setup.py script. More details about the setup script are here.

setup.py:

from setuptools import setup, find_packages

setup(
    name='my-project',
    packages=find_packages(),
    description='my python project',
    entry_points={
        "console_scripts": ['mypackage = mypackage.mainscript:main']
    },
    version='1.0.0',
    classifiers=[
        'Development Status :: 4 - Beta',
        'Programming Language :: Python :: 3'],
    install_requires=[
        'requests'
    ],
    package_data={
        'mypackage': ['package-data/package.conf',
                        'package-data/*.txt']
    },
    author='Andrew Liu')

Some important values are explained as below.

1. entry_points

This is the entry point of your code. "console_scripts" defines which function to execute when your application is called from command line. In the example, it is the main() function in mypackage.mainscript.py.

2. install_requires

This is to define all the dependencies of your module.

3. package_data

All other files which are not python file you want to package into your module need to be listed here.

To install your python module

cd path-to-my-project
python setup.py install

To run the package in your project

python -m mypackage

To run the wrapper script

python mypackage-runner.py

To check install

command -V mypackage

To run as module

mypackage

Wednesday, February 3, 2016

How Java garbage collection works

As a Java developer, we all know JVM provides us an automatic Garbage Collection mechanism. And we don't need to worry about memory allocation and deallocation like in C. But how GC works behind the scene? It would help us to write much better Java applications if we understand that.

There are many articles you can find from Google to dive deep into it, I will only put some GC basics in this blog. Firstly, you might heard a term of "stop-the-world". What does that mean? It means the JVM stops running the application for a GC execution. During the stop-the-world time, every thread will stop their tasks until the GC thread complete its task.

JVM Generations

In Java, we don't explicitly allocate and deallocate memory in the code. The GC finds those unreferenced objects and removes them. According to an article by Sangmin Lee[1], the GC was designed by following the two hypotheses below.

Most objects soon become unreachable.

References from old objects to young objects only exist in small numbers.

Therefore, the memory heap is broken into different segments, Java calls them as generations.

Young Generation: All new objects are allocated in Young Generation. When this area is full, GC removes unreachable objects from it. This is called "minor garbage collection" or "minor GC".

Old Generation: When objects survived from Young Generation, they are moved to Old Generation or Tenured Generation. Old Generation has bigger size and GC removes objects less frequently from it. When GC removes objects from Old Generation, it is called "major garbage collection" or "major GC".

Permanent Generation: Permanent Generation contains metadata of classes and methods, so it is also known as "method area". It does not store objects survived from Old Generation. The GC occurs in this area is also considered as "major GC". Some places call a GC as "full GC" if it performs on Permanent Generation.

You may notice the Young Generation is divided into a Eden space and two Survivor Spaces. They are used to determine the age of objects and whether to move them to Old Generation.

Generational Garbage Collection

Now, how does the GC process with those different generations in memory heap?

1. New created objects are allocated in Eden space. Two Survivor spaces are empty at the beginning.

2. When Eden space is full, a minor GC occurs. It deletes all unreferenced objects from Eden space and moves referenced objects to the first survivor space (S0). So the Eden space will be empty and new objects can be allocated to it.

3. When Eden space is full again, another minor GC occurs. It deletes all unreferenced objects from Eden space and moves referenced objects. But this time, referenced objects are moved to the second survivor space (S1). In addition, referenced objects in the first survivor space (S0) also get moved to S1 and have their age incremented. Unreferenced objects in S0 also get deleted. So we always have one survivor space empty.

4. The same process repeats in subsequent minor GC with survivor spaces switched.

5. When the aged objects in survivor spaces reach a threshold, they are moved to Old Generation.

6. When the Old Generation is full, a major GC will be performed to delete the unreferenced objects in Old Generation and compact the referenced objects.

The above steps are a quick overview of the GC in the Young Generation. The major GC process is different among different GC types. Basically, there are 5 GC types.

1. Serial GC

2. Parallel GC

3. Parallel Compacting GC

4. CMS GC

5. G1 GC

The 5 GC types can be switched using different command lines, like -XX:+UseG1GC will set the GC type to G1 GC.

Monitor Java Garbage Collection

There are several ways to monitor GC. I will list some most commonly used ones below.

jstat

jstat is in $JAVA_HOME/bin. You can run it by "jstat -gc <vmid> 1000". vmid is the virtual machine identifier. It is normally the process id of the JVM. 1000 means display the GC data every 1 second. The meaning of the output columns can be found here.

VisualVM

Visual VM is a GUI tool provided by Oracle. It can be downloaded from here.

GarbageCollectorMXBean and GarbageCollectionNotificationInfo

GarbageCollectorMXBean and GarbageCollectionNotificationInfo can be used to collect GC data in a programming way. An example can be found from here in my GitHub. You can use "mvn jetty:run" to start a jetty server and observe the GC information like below.

Minor GC: - 61 (Allocation Failure) start: 2016-02-03 22:22:17.784, end: 2016-02-03 22:22:17.789
        [Eden Space] init:4416K; used:19.2%(13440K) -> 0.0%(0K); committed: 19.2%(13440K) -> 19.2%(13440K)
        [Code Cache] init:160K; used:14.7%(4823K) -> 14.7%(4823K); committed: 14.7%(4832K) -> 14.7%(4832K)
        [Survivor Space] init:512K; used:16.7%(1456K) -> 13.3%(1162K); committed: 19.1%(1664K) -> 19.1%(1664K)
        [Metaspace] init:0K; used:19393K -> 19393K); committed: 19840K -> 19840K)
        [Tenured Gen] init:10944K; used:18.6%(32621K) -> 19.2%(33563K); committed: 19.0%(33360K) -> 19.2%(33616K)
duration:5ms, throughput:99.9%, collection count:61, collection time:213

Major GC: - 6 (Allocation Failure) start: 2016-02-03 22:22:17.789, end: 2016-02-03 22:22:17.839
        [Eden Space] init:4416K; used:0.0%(0K) -> 0.0%(0K); committed: 19.2%(13440K) -> 19.2%(13440K)
        [Code Cache] init:160K; used:14.7%(4823K) -> 14.7%(4823K); committed: 14.7%(4832K) -> 14.7%(4832K)
        [Survivor Space] init:512K; used:13.3%(1162K) -> 0.0%(0K); committed: 19.1%(1664K) -> 19.1%(1664K)
        [Metaspace] init:0K; used:19393K -> 19393K); committed: 19840K -> 19840K)
        [Tenured Gen] init:10944K; used:19.2%(33563K) -> 14.0%(24559K); committed: 19.2%(33616K) -> 19.2%(33616K)
duration:50ms, throughput:99.6%, collection count:6, collection time:228

Or you can run the GCMonitor class as a java application. It would take long time to finish the execution until a major GC occurs.

Reference:
[1] http://www.cubrid.org/blog/dev-platform/understanding-java-garbage-collection/
[2] http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html

Pages