Blog of Silvio Wangler Thoughts about software and stuff…

3Apr/130

Scan your documents and capture its content

Currently I am working on task to scan documents to PDFs and retrieve their content. This article explains how you do it if you do not have a searchable PDF. The following article will use Ubuntu Linux 12.10.

Step 1 - Install Tesseract

sudo apt-get install tesseract-ocr

Step 2 - Create a simple multi page PDF

To do so I have use Libre Office Writer and saved the document as PDF. Make sure the document contains the language you try to capture using OCR.

Step 3 - Use ghostscript to create a multi page TIFF

gs -o multipage-tiffg4.tif -sDEVICE=tiffg4 multipage-input.pdf

Step 4 - Run Tesseract

The following tells Tesseract to scan the TIFF called multipage-tiffg4.tif using an English dictionary and store the captured output in a file called multipage-tiffg4-ocr-capture.txt. The .txt is added by Tesseract itself.

tesseract multipage-tiffg4.tif multipage-tiffg4-ocr-capture -l eng

Step 5 - Review the result

You made it! Enjoy the result

16Jul/120

Using iText to analyze TIFF documents

Recently I have learned that I can use iText to determine the number of pages from a single or multi page TIFF document. Here is how it works.

private byte[] pdfContent;

int numberOfPages = TiffImage.getNumberOfPages(new RandomAccessFileOrArray(pdfContent));

Isn't it easy?

Tagged as: , , , No Comments
2May/120

Speeding up compilation time on a multi module Maven projects

I recently found a tweet by Kristian Rosenvold on Twitter talking about performance improvements on multi module Maven 2/3 projects. Our build process takes quiet an amount of time and therefore performance improvements always are very welcome on my company's software project.

The tweet leads to a Gist on GitHub that informs new version of the Plexus compiler that is used by the Maven-Compiler-Plugin. Nice! So I applied the explicit dependency in my <root> pom.xml in the <pluginManagement> section (see the listing below).

<plugin>
	<groupId>org.apache.maven.plugins</groupId>
	<artifactId>maven-compiler-plugin</artifactId>
	<dependencies>
		<dependency>
			<groupId>org.codehaus.plexus</groupId>
			<artifactId>plexus-compiler-javac</artifactId>
			<version>1.8.6</version>
		</dependency>
	</dependencies>
</plugin>

Then I asked Jenkins to run the build several times and I was really surprised by the result. On my multi module project a full build consumes about 12-15 minutes. After applying that new Plexus version I managed to decrease the build time down to about 7-8 minutes. So the result is in my case about 30% - 45% performance improvement!

Tagged as: , , No Comments
11Apr/120

How to insert the content of a text file into an MS SQL Server CLOB by using plain SQL?

Let's say we have a table like this and we would like to insert a text file MyTextFile.txt into the CLOB called SCRIPT_CONTENT.

CREATE TABLE RE_SCRIPT_VERSION(
	[ID] [numeric](19, 0) IDENTITY(1,1) NOT NULL,
	[SCRIPT_VERSION] [numeric](19, 0) NOT NULL,
	[ACTIVE] [bit] NOT NULL,
	[DELETED] [bit] NOT NULL,
	[SCRIPT_CONTENT] 1 NOT NULL,
	[SCRIPT_COMMENT] [varchar](255) NOT NULL,
	[SCRIPT_CDATE] [datetime] NOT NULL);

So here is how it can be done on Microsoft SQL Server.

INSERT INTO RE_SCRIPT_VERSION
      (SCRIPT_VERSION, ACTIVE, DELETED, SCRIPT_COMMENT, SCRIPT_CDATE, SCRIPT_CONTENT)
      SELECT 1 AS SCRIPT_VERSION,
         1 AS ACTIVE,
         0 AS DELETED,
         'Initial version' AS SCRIPT_COMMENT,
         CURRENT_TIMESTAMP AS SCRIPT_CDATE,
         * FROM OPENROWSET(BULK N'C:\temp\MyTextFile.txt', SINGLE_CLOB) AS SCRIPT_CONTENT;
14Feb/120

Admin Tools you’ll need on Windows Server

Recently I was delivering software (EAR file) to a customer. And it turned out that the following tools are a must have.

* A diff tool (such as Winmerge)
* 7zip
* Notepad++

Tagged as: , No Comments
30Dec/110

Groovy Magic

Groovy is just wonderful. Check out the following Groovy listing. With Groovy you easily can implement dynamic method calls.

class WellThatIsGroovy {    
    String name
    Date bar
}

def x = 'name'
def j = new WellThatIsGroovy(name : 'hzasdjkfhjk', bar: new Date())

println j."${x}"
println j.'bar'.format('dd.MM.yyyy HH:mm:ss')

Have a go with this script at the Groovy Web Console.

Tagged as: No Comments
26Sep/1110

Using Caches in Grails 2.0.

Currently Grails Milestone Release 2.0.0M2 is available to download and comes along with Springs Milestone Release 3.1.0.M1.  This version of the Spring framework introduces very handy Cache Abstraction. This post will show you how to use Springs Cache Abstraction within Grails.

First of all make sure you're using Grails 2.0.0.M1 or greater. Only these versions of Grails run on top of Spring 3.1.

Lets then start by enabling Springs cache abstraction and defining a cache manager by extending Grails resources.xml.

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:p="http://www.springframework.org/schema/p"
       xmlns:cache="http://www.springframework.org/schema/cache"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
      http://www.springframework.org/schema/cache http://www.springframework.org/schema/cache/spring-cache.xsd">

    <cache:annotation-driven/>

    <bean id="cacheManager" class="org.springframework.cache.support.SimpleCacheManager">
        <property name="caches">
            <set>
                <bean class="org.springframework.cache.concurrent.ConcurrentMapCacheFactoryBean" p:name="default"/>
            </set>
        </property>
    </bean>
</beans>

The cache manager uses a JDK ConcurrentMap based cache that is named 'default'. Currently Spring also supports EhCache. Now that Spring is enabled to cache return values of Spring services and a cache manager is defined we are ready to implement a plain old spring service.

Lets pretend that the following look up takes ages and due to this you only want to perform that operation once at runtime. Therefore you annotate our service method by using
@Cachable and tell Spring to use the cache called 'default' and generate a key by the parameter 'key'.

package my.package.slow

import org.springframework.transaction.annotation.Transactional
import org.springframework.transaction.annotation.Propagation
import org.springframework.transaction.annotation.Isolation
import org.springframework.cache.annotation.Cacheable

class MyVerySlowService {    

    @Cacheable(value="default", key="#key")
    @Transactional(propagation = Propagation.SUPPORTS, isolation = Isolation.READ_COMMITTED, readOnly = true)
    List<String> performVerySlowLookup(String key) {
        return VerySlowEntity.findAllByNonIndexedKey(key)
    }
}

After that you're good to go. Run 'grails run-app' and after the first call to this service operation 'performVerySlowLookup()' Spring makes sure that the return value is read from the cache instead of again performing a call to the service.

Tagged as: , 10 Comments
18Aug/110

Spring Framework – Understand @Autowired

Yesterday I have learned that I did not correctly understand how Spring 3.0.5 (and below) works with @Autowired annotations. Imagine we define a Spring bean using the following annotations and additionally have two lists as Spring beans defined.

@Component
public class MightyValidator {
   @Autowired
   private List<String> validationRegexes;

   @Autowired
   private List<String> otherMightyStuff;

   // ... some more mighty code ....
}
<beans>
  <util:list id="validationRegexes">
    <value>1</value>
  </util:list>

  <util:list id="otherMightyStuff">
    <value>1</value>
  </util:list>
</beans>

My previous understanding of @Autowired was that it injects the depending Spring bean by analyzing its type and the field name. And then I run into this given example that my @Component class defined several fields with the same datatype (according to this example java.util.List<String>). I ended up debugging the whole stuff and found out that the Spring bean validationRegexes was injected into both List<String> fields of MightyValidator. So what is going on?

Obviously @Autowired is handled in a different way than I understud. Its more like

Hey I found a List<String> field that has to get auto wired. I will consult my application context and try to find a bean of type List<String>. If I find one I will inject it to the target bean. Done and delivered!

So how to solve that issue? There are the following possibilities to solve this problem.

  • If you dependency is not optional use @Resource
  • If you like to stay flexible or if you have to mark a dependency as optional stay with @Autowired but additionally use @Qualifier
Solution A - Introduce @Resource
@Component
public class MightyValidator {
   @Resource("validationRegexes")
   private List<String> validationRegexes;

   @Resource("otherMightyStuff")
   private List<String> otherMightyStuff;

   // ... some more mighty code ....
}

Solution B - Add @Qualifier

@Component
public class MightyValidator {
   @Autowired
   @Qualifier("validationRegexes")
   private List<String> validationRegexes;

   @Autowired
   @Qualifier("otherMightyStuff")
   private List<String> otherMightyStuff;

   // ... some more mighty code ....
}
<beans>
<bean id="validationRegexes" class="org.springframework.beans.factory.config.ListFactoryBean">
 <qualifier name="validationRegexes"/>
<property name="sourceList">
      <list>
        <value>1</value>
      </list>
  </property>
</bean>
<bean id="otherMightyStuff" class="org.springframework.beans.factory.config.ListFactoryBean">
 <qualifier name="otherMightyStuff"/>
<property name="sourceList">
      <list>
        <value>1</value>
      </list>
  </property>
</bean>
</beans>
Tagged as: , No Comments
28Feb/110

Grails – GORM fails on error

Nice. Just found yet another useful feature in Grails (using 1.3.7). In earlier version, I think Grails 1.1.x and below I always had to call validate explictly.

/*
Lets pretend we have a domain class called User
*/
User user = new User(name:'user', password: 'shht. dont tell anyone')
if (!(user.validate() && user.save())) {
  // print validation errors
}

Since Grails 1.2 (not sure when this feature exactly has been introduced) there are two new ways to achieve the same result as mentioned above.
First the manual one:

User user = new User(name:'user', password: 'shht. dont tell anyone')
if (!user.save(failOnError: true)) { 
   // print validation errors 
}

And if you want to enable this feature globally just set the following property to true in your Config.groovy

grails.gorm.failOnError=true
Tagged as: No Comments
25Feb/110

Grails security plugins : Amazing progress!

It has been a while since I developed web application using Grails and I have been surprised by the huge progress done by the community and SpringSource. Grails plug-in repository has been growing an some dedicated plug-ins are officially managed by SpringSource itself. One of them is the Spring security plug-in. I am familiar to the Acegi and its Grails plugin and I also knew that Acegi became a Spring sub project called Spring security. What I missed since my last visit on the Grails hompage is that there are some amazing security plug-ins supporting a developer to secure its web application.

The Grails Spring security plug-in has its roots in the Acegi plug-in but supports Spring security 3.x.  Grails Spring security plug-in contains serveral optional security module. Here is a list of the modules.

This really looks promising and I am going to take a look at spring-security-core and spring-security-ui since I have started a new Grails web application project on Git Hub.

Tagged as: , No Comments