Scan your documents and capture its content
Currently I am working on task to scan documents to PDFs and retrieve their content. This article explains how you do it if you do not have a searchable PDF. The following article will use Ubuntu Linux 12.10.
Step 1 - Install Tesseract
sudo apt-get install tesseract-ocr
Step 2 - Create a simple multi page PDF
To do so I have use Libre Office Writer and saved the document as PDF. Make sure the document contains the language you try to capture using OCR.
Step 3 - Use ghostscript to create a multi page TIFF
gs -o multipage-tiffg4.tif -sDEVICE=tiffg4 multipage-input.pdf
Step 4 - Run Tesseract
The following tells Tesseract to scan the TIFF called multipage-tiffg4.tif using an English dictionary and store the captured output in a file called multipage-tiffg4-ocr-capture.txt. The .txt is added by Tesseract itself.
tesseract multipage-tiffg4.tif multipage-tiffg4-ocr-capture -l eng
Step 5 - Review the result
You made it! Enjoy the result
Using iText to analyze TIFF documents
Recently I have learned that I can use iText to determine the number of pages from a single or multi page TIFF document. Here is how it works.
private byte[] pdfContent; int numberOfPages = TiffImage.getNumberOfPages(new RandomAccessFileOrArray(pdfContent));
Isn't it easy?
Speeding up compilation time on a multi module Maven projects
I recently found a tweet by Kristian Rosenvold on Twitter talking about performance improvements on multi module Maven 2/3 projects. Our build process takes quiet an amount of time and therefore performance improvements always are very welcome on my company's software project.
The tweet leads to a Gist on GitHub that informs new version of the Plexus compiler that is used by the Maven-Compiler-Plugin. Nice! So I applied the explicit dependency in my <root> pom.xml in the <pluginManagement> section (see the listing below).
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <dependencies> <dependency> <groupId>org.codehaus.plexus</groupId> <artifactId>plexus-compiler-javac</artifactId> <version>1.8.6</version> </dependency> </dependencies> </plugin>
Then I asked Jenkins to run the build several times and I was really surprised by the result. On my multi module project a full build consumes about 12-15 minutes. After applying that new Plexus version I managed to decrease the build time down to about 7-8 minutes. So the result is in my case about 30% - 45% performance improvement!
How to insert the content of a text file into an MS SQL Server CLOB by using plain SQL?
Let's say we have a table like this and we would like to insert a text file MyTextFile.txt into the CLOB called SCRIPT_CONTENT.
CREATE TABLE RE_SCRIPT_VERSION( [ID] [numeric](19, 0) IDENTITY(1,1) NOT NULL, [SCRIPT_VERSION] [numeric](19, 0) NOT NULL, [ACTIVE] [bit] NOT NULL, [DELETED] [bit] NOT NULL, [SCRIPT_CONTENT] 1 NOT NULL, [SCRIPT_COMMENT] [varchar](255) NOT NULL, [SCRIPT_CDATE] [datetime] NOT NULL);
So here is how it can be done on Microsoft SQL Server.
INSERT INTO RE_SCRIPT_VERSION
(SCRIPT_VERSION, ACTIVE, DELETED, SCRIPT_COMMENT, SCRIPT_CDATE, SCRIPT_CONTENT)
SELECT 1 AS SCRIPT_VERSION,
1 AS ACTIVE,
0 AS DELETED,
'Initial version' AS SCRIPT_COMMENT,
CURRENT_TIMESTAMP AS SCRIPT_CDATE,
* FROM OPENROWSET(BULK N'C:\temp\MyTextFile.txt', SINGLE_CLOB) AS SCRIPT_CONTENT;
Admin Tools you’ll need on Windows Server
Recently I was delivering software (EAR file) to a customer. And it turned out that the following tools are a must have.
* A diff tool (such as Winmerge)
* 7zip
* Notepad++
Groovy Magic
Groovy is just wonderful. Check out the following Groovy listing. With Groovy you easily can implement dynamic method calls.
class WellThatIsGroovy {
String name
Date bar
}
def x = 'name'
def j = new WellThatIsGroovy(name : 'hzasdjkfhjk', bar: new Date())
println j."${x}"
println j.'bar'.format('dd.MM.yyyy HH:mm:ss')
Have a go with this script at the Groovy Web Console.
Using Caches in Grails 2.0.
Currently Grails Milestone Release 2.0.0M2 is available to download and comes along with Springs Milestone Release 3.1.0.M1. This version of the Spring framework introduces very handy Cache Abstraction. This post will show you how to use Springs Cache Abstraction within Grails.
First of all make sure you're using Grails 2.0.0.M1 or greater. Only these versions of Grails run on top of Spring 3.1.
Lets then start by enabling Springs cache abstraction and defining a cache manager by extending Grails resources.xml.
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:cache="http://www.springframework.org/schema/cache"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/cache http://www.springframework.org/schema/cache/spring-cache.xsd">
<cache:annotation-driven/>
<bean id="cacheManager" class="org.springframework.cache.support.SimpleCacheManager">
<property name="caches">
<set>
<bean class="org.springframework.cache.concurrent.ConcurrentMapCacheFactoryBean" p:name="default"/>
</set>
</property>
</bean>
</beans>
The cache manager uses a JDK ConcurrentMap based cache that is named 'default'. Currently Spring also supports EhCache. Now that Spring is enabled to cache return values of Spring services and a cache manager is defined we are ready to implement a plain old spring service.
Lets pretend that the following look up takes ages and due to this you only want to perform that operation once at runtime. Therefore you annotate our service method by using
@Cachable and tell Spring to use the cache called 'default' and generate a key by the parameter 'key'.
package my.package.slow
import org.springframework.transaction.annotation.Transactional
import org.springframework.transaction.annotation.Propagation
import org.springframework.transaction.annotation.Isolation
import org.springframework.cache.annotation.Cacheable
class MyVerySlowService {
@Cacheable(value="default", key="#key")
@Transactional(propagation = Propagation.SUPPORTS, isolation = Isolation.READ_COMMITTED, readOnly = true)
List<String> performVerySlowLookup(String key) {
return VerySlowEntity.findAllByNonIndexedKey(key)
}
}
After that you're good to go. Run 'grails run-app' and after the first call to this service operation 'performVerySlowLookup()' Spring makes sure that the return value is read from the cache instead of again performing a call to the service.
Spring Framework – Understand @Autowired
Yesterday I have learned that I did not correctly understand how Spring 3.0.5 (and below) works with @Autowired annotations. Imagine we define a Spring bean using the following annotations and additionally have two lists as Spring beans defined.
@Component
public class MightyValidator {
@Autowired
private List<String> validationRegexes;
@Autowired
private List<String> otherMightyStuff;
// ... some more mighty code ....
}
<beans>
<util:list id="validationRegexes">
<value>1</value>
</util:list>
<util:list id="otherMightyStuff">
<value>1</value>
</util:list>
</beans>
My previous understanding of @Autowired was that it injects the depending Spring bean by analyzing its type and the field name. And then I run into this given example that my @Component class defined several fields with the same datatype (according to this example java.util.List<String>). I ended up debugging the whole stuff and found out that the Spring bean validationRegexes was injected into both List<String> fields of MightyValidator. So what is going on?
Obviously @Autowired is handled in a different way than I understud. Its more like
Hey I found a List<String> field that has to get auto wired. I will consult my application context and try to find a bean of type List<String>. If I find one I will inject it to the target bean. Done and delivered!
So how to solve that issue? There are the following possibilities to solve this problem.
- If you dependency is not optional use @Resource
- If you like to stay flexible or if you have to mark a dependency as optional stay with @Autowired but additionally use @Qualifier
Solution A - Introduce @Resource@Component public class MightyValidator { @Resource("validationRegexes") private List<String> validationRegexes; @Resource("otherMightyStuff") private List<String> otherMightyStuff; // ... some more mighty code .... }Solution B - Add @Qualifier
@Component public class MightyValidator { @Autowired @Qualifier("validationRegexes") private List<String> validationRegexes; @Autowired @Qualifier("otherMightyStuff") private List<String> otherMightyStuff; // ... some more mighty code .... }<beans> <bean id="validationRegexes" class="org.springframework.beans.factory.config.ListFactoryBean"> <qualifier name="validationRegexes"/> <property name="sourceList"> <list> <value>1</value> </list> </property> </bean> <bean id="otherMightyStuff" class="org.springframework.beans.factory.config.ListFactoryBean"> <qualifier name="otherMightyStuff"/> <property name="sourceList"> <list> <value>1</value> </list> </property> </bean> </beans>
Grails – GORM fails on error
Nice. Just found yet another useful feature in Grails (using 1.3.7). In earlier version, I think Grails 1.1.x and below I always had to call validate explictly.
/*
Lets pretend we have a domain class called User
*/
User user = new User(name:'user', password: 'shht. dont tell anyone')
if (!(user.validate() && user.save())) {
// print validation errors
}
Since Grails 1.2 (not sure when this feature exactly has been introduced) there are two new ways to achieve the same result as mentioned above.
First the manual one:
User user = new User(name:'user', password: 'shht. dont tell anyone')
if (!user.save(failOnError: true)) {
// print validation errors
}
And if you want to enable this feature globally just set the following property to true in your Config.groovy
grails.gorm.failOnError=true
Grails security plugins : Amazing progress!
It has been a while since I developed web application using Grails and I have been surprised by the huge progress done by the community and SpringSource. Grails plug-in repository has been growing an some dedicated plug-ins are officially managed by SpringSource itself. One of them is the Spring security plug-in. I am familiar to the Acegi and its Grails plugin and I also knew that Acegi became a Spring sub project called Spring security. What I missed since my last visit on the Grails hompage is that there are some amazing security plug-ins supporting a developer to secure its web application.
The Grails Spring security plug-in has its roots in the Acegi plug-in but supports Spring security 3.x. Grails Spring security plug-in contains serveral optional security module. Here is a list of the modules.
- Spring Security OpenID which adds support for OpenID authentication
- Spring Security ACL which adds support for object-level and method-level authorization using ACLs
- Spring Security CAS which adds support for single sign-on using Jasig CAS
- Spring Security LDAP which adds support for LDAP and ActiveDirectory authentication
- Spring Security UI which provides CRUD screens and other user management workflows.
- Spring Security Kerberos which adds support for single sign-on using Kerberos
- Spring Security AppInfo which provides a basic UI to view the security configuration
This really looks promising and I am going to take a look at spring-security-core and spring-security-ui since I have started a new Grails web application project on Git Hub.
