Debugging unit tests in a Gradle build

22 November 2015

Have you ever been in need to analyse why a unit test is failing in your Gradle build but perfectly works in your IDE? No problem. Simply use Gradle in the non daemon mode and set the org.gradle.debug property to true.

gradle --no-daemon -Dorg.gradle.debug=true clean check

After that set your breakpoints in your source code and attach your IDE debugger to the Gradle process.

Split large files using Sed

31 October 2015

In production log files can easily become large within no time. Many common editors cannot properly handle a large text file and therefore it usually takes quiet some time to browse through a large log file.

If your working with Linux or Mac OS X sed is a wonderful tool to cut your large log file into pieces. Let’s say you need to analyse a log file called application.log that contains the most interesting content between lines 5400 and 5623. The follow command line call will extract excately this range for you and print it to your stout which is your console.

sed -n '5400,5623p' application.log

It might be handy to redirect the output to a new file by using sed -n '5400,5623p' application.log > application5400-5623.log.

Scan your documents and capture its content

03 April 2013

Currently I am working on task to scan documents to PDFs and retrieve their content. This article explains how you do it if you do not have a searchable PDF. The following command have been evaluated on Ubuntu Linux 12.10 and will most likely work on any other Debian based distribution.

Step #1 - Install Tesseract

sudo apt-get install tesseract-ocr

Step #2 - Create a simple multi page PDF

To do so I have use Libre Office Writer and saved the document as PDF. Make sure the document contains the language you try to capture using OCR.

Step #3 - Use ghostscript to convert the PDF into a

gs -o multipage-tiffg4.tif -sDEVICE=tiffg4 multipage-input.pdf

Step #4 - Run Tesseract

The following tells Tesseract to scan the TIFF called multipage-tiffg4.tif using an English dictionary and store the captured output in a file called multipage-tiffg4-ocr-capture.txt. The .txt was is added by Tesseract itself.

tesseract multipage-tiffg4.tif multipage-tiffg4-ocr-capture -l en

Step #5 - Review the result

You made it! Enjoy the result

Using iText to analyze TIFF documetns

16 July 2012

Recently I have learned that I can use iText to determine the number of pages from a single or multi page TIFF document. Here is how it works.

private byte[] pdfContent;

int numberOfPages = TiffImage.getNumberOfPages(new RandomAccessFileOrArray(pdfContent));

Isn’t it simple?

Speeding up compilation time on multi module Maven projects

02 May 2012

I recently found a tweet by Kristian Rosenvold on Twitter talking about performance improvements on multi module Maven 2/3 projects. Our build process takes quiet an amount of time and therefore performance improvements always are very welcome on my company’s software project.

The tweet leads to a Gist on GitHub that informs new version of the Plexus compiler that is used by the Maven-Compiler-Plugin. Nice! So I applied the explicit dependency in my <root> pom.xml in the <pluginManagement> section (see the listing below).


Then I asked Jenkins to run the build several times and I was really surprised by the result. On my multi module project a full build consumes about 12-15 minutes. After applying that new Plexus version I managed to decrease the build time down to about 7-8 minutes. So the result is in my case about 30% - 45% performance improvement!

Groovy magic

30 December 2011

Groovy is just wonderful. Check out the following Groovy listing. With Groovy you easily can implement dynamic method calls.

class WellThatIsGroovy {
    String name
    Date bar

def x = 'name'
def j = new WellThatIsGroovy(name : 'hzasdjkfhjk', bar: new Date())

println j."${x}"
println j.'bar'.format('dd.MM.yyyy HH:mm:ss')

Have a go with this script at the Groovy Web Console.

Older posts are available in the archive.