In this blog post, I explore the capabilities of JDK Mission Control, a powerful tool for low-overhead performance analysis and diagnostics of Java applications.

If you’ve read my previous post, you’ll know that I have been using DeepSeek extensively. However, with the increasing popularity of DeepSeek, I have noticed a degradation in service performance, as reflected in the DeepSeek status. To mitigate this, I have switched to Perplexity Pro a complimentary service offered to Singtel customers. For those without access to this, an alternative is Google AI Studio. Having an AI pair programmer significantly enhances the troubleshooting process.


Setting up JMC

To begin, download and install JMC from the official JMC 9.0.0 downloads page.

As part of my ongoing work, I aimed to optimize the inference performance of Micronaut-Llama3 to support Unsloth’s DeepSeek-R1. Since DeepSeek-R1 only supports Q4 and Q8 quantization for the Llama architecture, I opted for the Q8_0 model.

jmc-unsloth-deepseek-r1-distill-llama-8b-gguf

To integrate support for this model, I made the following modifications:

Changes to micronaut/model/ChatFormat.java:

public ChatFormat(Tokenizer tokenizer) {
    this.tokenizer = tokenizer;
    Map<String, Integer> specialTokens = this.tokenizer.getSpecialTokens();
    specialTokens.putIfAbsent("<|begin_of_text|>", 128000); // for DeepSeek-R1
    specialTokens.putIfAbsent("<|end_of_text|>", 128001); // for DeepSeek-R1

    this.beginOfText = getRequiredToken(specialTokens, "<|begin_of_text|>");
    this.startHeader = getRequiredToken(specialTokens, "<|start_header_id|>");
    this.endHeader = getRequiredToken(specialTokens, "<|end_header_id|>");
    this.endOfTurn = getRequiredToken(specialTokens, "<|eot_id|>");
    this.endOfText = getRequiredToken(specialTokens, "<|end_of_text|>");
    this.endOfMessage = specialTokens.getOrDefault("<|eom_id|>", -1); // only in 3.1
    this.stopTokens = Set.of(endOfText, endOfTurn);
}

Changes to micronaut/model/Tokenizer.java:

public Tokenizer(Vocabulary vocabulary, List<Pair<Integer, Integer>> merges, String regexPattern,
        Map<String, Integer> specialTokens) {
    specialTokens.putIfAbsent("<|begin_of_text|>", 128000); // for DeepSeek-R1
    specialTokens.putIfAbsent("<|end_of_text|>", 128001); // for DeepSeek-R1

    this.vocabulary = vocabulary;
    this.compiledPattern = regexPattern != null ? Pattern.compile(regexPattern) : null;
    this.specialTokens = new HashMap<>(specialTokens);
    this.merges = new HashMap<>();
    for (Pair<Integer, Integer> pair : merges) {
        int firstIndex = pair.first();
        int secondIndex = pair.second();
        int mergeIndex = vocabulary.getIndex(vocabulary.get(firstIndex) + vocabulary.get(secondIndex))
                .orElseThrow();
        this.merges.put(pair, mergeIndex);
    }
}
...
public String decode(List<Integer> tokens) {
    String decoded = decodeImpl(tokens);

    // Replace the original decodedBytesAsInts with the below
    int[] decodedBytesAsInts = decoded.codePoints()
            .map(cp -> {
                Integer decodedByte = BYTE_DECODER.get(cp);
                if (decodedByte == null) {
                    return (int) '?';
                }
                return decodedByte;
            })
            .toArray();

    byte[] rawBytes = new byte[decodedBytesAsInts.length];
    for (int i = 0; i < decoded.length(); i++) {
        rawBytes[i] = (byte) decodedBytesAsInts[i];
    }
    return new String(rawBytes, StandardCharsets.UTF_8);
}

Profiling with JMC

To start profiling, simply initiate the Flight Recorder, as shown below:

jmc-start-flight-recorder

Referencing the Llama3.java post, I ran the application using:

gradlew run

Configuration in application.properties:

micronaut.application.name=llama3
micronaut.server.port=8888
llama.BatchSize=32
llama.VectorBitSize=512
llama.PreloadGGUF=DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf
options.model_path=DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf
options.temperature=0.1f
options.topp=0.95f
options.seed=42
options.max_tokens=512
options.stream=true
options.echo=true
options.fullResponseStream=true

Test URL:

http://localhost:8888/api/llama3/generate?prompt=Why%20is%20the%20sky%20blue?

The profiling results are as follows:

jmc-automated-analysis-results

Performance Optimization: ByteVector Operations

The flame graph analysis highlighted ByteVector operations as an optimization opportunity:

jmc-flame-graph-lanewise

Optimized code snippet:

// Instead of separate operations
ByteVector loBytes = wBytes.and(MASK_LOW).sub(OFFSET_8);
ByteVector hiBytes = wBytes.lanewise(VectorOperators.LSHR, 4).sub(OFFSET_8);

// Combine operations
ByteVector loBytes = wBytes.and(MASK_LOW);
ByteVector hiBytes = wBytes.lanewise(VectorOperators.LSHR, 4).and(MASK_LOW);
ByteVector combined = loBytes.blend(hiBytes.lanewise(VectorOperators.LSHL, 4), BLEND_MASK);

JMC vs. VisualVM: A Comparative Analysis

JMC offers a more sophisticated and efficient profiling experience, making it a preferred tool for optimizing Java applications at scale.

Feature VisualVM JDK Mission Control (JMC)
Ease of Use Simple, user-friendly Advanced, steeper learning curve
Performance Overhead Higher Lower
Flame Graphs Requires plugins Built-in
Data Granularity Basic monitoring data Detailed, in-depth insights
Best Use Case General debugging & profiling Low-overhead, enterprise-grade profiling
Data Collection Method JMX Java Flight Recorder (JFR)

By leveraging JMC, I was able to identify and optimize key performance bottlenecks in my project. If you’re working with Java applications that require in-depth profiling, JMC is a must-have tool.

Stay tuned for more insights on optimizing Java applications!