Apache and Java Explorations: 2017

Friday, 21 April 2017

Amazon EC2 Linux instance with desktop functionality from Windows

This article describes how to connect to an Amazon EC2 Linux instance desktop by using Windows Remote Desktop.

For purposes of this article, make sure that you are running an instance of Ubuntu 14.04 LTS. In addition, this article assumes the user name is 'ubuntu'.

Connect to your Linux instance as described in Connecting to Your Linux Instance from Windows Using PuTTy.
Run the following commands from the terminal to install updates, an upgrade, and install additional packages.

sudo apt-get update
sudo apt-get upgrade

Because you will be connecting from Windows Remote Desktop, edit the sshd_config file on your Linux instance to allow password authentication.

sudo vim /etc/ssh/sshd_config

Change PasswordAuthentication to yes from no, then save and exit.
Restart the SSH daemon to make this change take effect.

sudo /etc/init.d/ssh restart

Temporarily gain root privileges and change the password for the ubuntu user to a complex password to enhance security. Press the Enter key after typing the command passwd ubuntu, and you will be prompted to enter the new password twice.

sudo –i
passwd ubuntu

Switch back to the ubuntu user account and cd to the ubuntu home directory.

su ubuntu
cd

Install Ubuntu desktop functionality on your Linux instance.

export DEBIAN_FRONTEND=noninteractive
sudo -E apt-get update
sudo -E apt-get install -y ubuntu-desktop

Install XRDP and other xfce4 resources.

sudo apt-get install xfce4 xrdp
sudo apt-get install xfce4 xfce4-goodies

Make xfce4 the default window manager for RDP connections.

echo xfce4-session > ~/.xsession

Copy .xsession to the /etc/skel folder so that xfce4 is set as the default window manager for any new user accounts that are created.

sudo cp /home/ubuntu/.xsession /etc/skel

Open the xrdp.ini file to allow changing of the host port you will connect to.

sudo vim /etc/xrdp/xrdp.ini

Look for the section [xrdp1] and change the following text (then save and exit [:wq]).

port= -1 - to - port= ask-1

Restart xrdp.

sudo service xrdp restart

On Windows, open the Remote Desktop Connection client, paste the fully qualified name of your Amazon EC2 instance for the Computer, and then click Connect.

When prompted to Login to xrdp, ensure that the sesman-Xvnc module is selected, and enter the username ubuntu with the new password that you created in step 7. When you start a session, the port number is -1.

When the system connects, several status messages are displayed on the Connection Log screen. Pay close attention to these status messages and make note of the VNC port number displayed. If you want to return to a session later, specify this number in the port field of the xrdp login dialog box.

Tuesday, 11 April 2017

Introducing Apache Beam

As part of the Google Cloud ecosystem, Google created Dataflow SDK. Now, as a Google, Talend, Cask, data Artisans, PayPal, and Cloudera join effort, Apache Dataflow is at Apache Incubator.

Architecture and Programming model

Just Imagine, you have a Hadoop cluster where you used MapReduce jobs. Now, you want to “migrate” these jobs to Spark: you have to refactor all your jobs which requires lot of works and cost. And after that, see the effort and cost if you want to change for a new platform like Flink: you have to refactor your jobs again.

Dataflow aims to provide an abstraction layer between your code and the execution run-time.

The SDK allows you to use an unified programming model: you implement your data processing logic using the Dataflow SDK, the same code will run on different back ends(Spark / Flink / etc). You don’t need to refactor and change the code anymore !

If your target back end is not yet supported by Dataflow, you can implement your own runner for this back end, again the code using Dataflow SDK doesn’t change.

Dataflow is able to deal with batch processing jobs, but also with streaming jobs.

Pipelines, translators and runners

Using this SDK, your jobs are actually designed as pipeline. A pipeline is a chain of processes on the data.

It’s basically the only part that you have to write.

Dataflow reads the pipelines definition, and translate them for a target runner like SPARK/FLINK. A translator is responsible of adapting the pipeline code depending of the runner. For instance, the MapReduce translator will transform pipelines as MapReduce jobs, the Spark translator will transform pipelines as Spark jobs, etc.

The runners are the “execution” layer. Once a pipeline has been “translated” by a translator, it can be run directly on a runner. The runner “hides” the actual back end: MapReduce/Yarn cluster, Spark cluster (running on Yarn or Mesos), etc.

If Dataflow comes with ready to use translators and runners, you can create your own ones.

For instance, you can implement your own runner by creating a class extending PipelineRunner. You will have to implement different runner behaviors (like the transform evaluates, supported options, apply main transform hook method, etc).

SDK:

The SDK is composed by four parts:

Pipelines are the streaming and processing logic that you want to implement. It’s a chain of processes. Basically, in a pipeline, you read data from a source, you apply transformations on the data, and eventually send the data to a destination (named sink in Dataflow wording).

PCollection is the object transported inside a pipeline. It’s the container of the data, flowing between each step of the pipeline.

Transform is a step of a pipeline. It takes an incoming PCollection and creates an out coming PCollection. You can implement your own transform function.

Sink and Source are used to retrieve data as input (first step) of a pipeline, and eventually send the data outside of the pipeline.

Scala Overview

Scala, short for Scalable language, is a functional programming language. Scala integrates features of object-oriented and functional languages and it is compiled to run on the Java Virtual Machine.

Scala is object oriented

Scala is a pure object-oriented language in the sense that every value is an object. Types and behavior of objects are described by classes which will be explained in subsequent chapters.

Classes are extended by subclass and a flexible composition mechanism as a clean replacement for multiple inheritance.

Scala runes on JVM

Scala is compiled into Java Byte Code, which is executed by the Java Virtual Machine (JVM). This means that Scala and Java have a common run-time platform. You can easily move from Java to Scala. The Scala compiler compiles your Scala code into Java Byte Code, which can then be executed by the scala command. The scala command is similar to the java command.

Scala is functional

Scala is also a functional language in the sense that every function is a value and because every value is an object so ultimately every function is an object.

Scala provides a lightweight syntax for defining anonymous functions, it supports higher-order functions, it allows functions to be nested, and supports currying.

Scala can execute java code

Scala enables you to use all the classes of the Java SDK's in Scala, and also your own, custom Java classes, or your favourite Java open source projects.

Java is statically typed

Scala, unlike some of the other statically typed languages, does not expect you to provide redundant type information. You don't have to specify a type in most cases, and you certainly don't have to repeat it.

Scala features which is differ from Java

Scala has a set of features, which differ from Java. Some of these are:

All types are objects.
Type inference.
Nested Functions.
Functions are objects.
Domain specific language (DSL) support.
Traits.
Closures.
Concurrency support inspired by Erlang.

Scala is being used everywhere and importantly in enterprise web applications. You can check few of the most popular Scala web frameworks:

The Lift Framework.
The Play framework.
The Bowler framework.

Environment Setup

The Scala language can be installed on any UNIX-like or Windows system. Before you start installing Scala on your machine, you must make sure that you have Java 1.5 or greater installed on your computer.

Installing Scala on Windows

Java Setup:

First, you must set the JAVA_HOME environment variable and add the JDK's bin directory to your PATH variable. To verify if everything is fine, at command prompt, do following.

C:\>java -version

java version "1.7.51"

Java(TM) SE Runtime Environment (build 1.7.51-b51)

Java HotSpot(TM) 64-Bit Server VM (build 14.1-b05, mixed mode)

C:\>

Test to see that the Java compiler is installed. Type javac -version. You should see something like the following:

C:\>javac -version javac 1.7.51

C:\>

Scala Setup:

you can download Scala from http://www.scala-lang.org/downloads. I am trying with scala-2.9.0.1-installer.jar and put it in C:/> directory. Execute the following command at command prompt:

C:\>java -jar scala-2.9.0.1-installer.jar

C:\>

This will display an installation wizard, which will guide you to install scala on your windows machine. During installation, it will ask for license agreement, simply accept it and further it will ask a path where scala will be installed.

C:\>scala -version

C:\>

Installing Scala on Linux

Java Setup:

Make sure you have Java JDK 1.7 or greater installed on your computer and set JAVA_HOME environment variable and add the JDK's bin directory to your PATH variable. To verify if everything is fine, at command prompt, type java -version and press Enter. You should see something like the following:

$java -version

java version "1.7.51"

Java(TM) 2 Runtime Environment, Standard Edition (build 1.7.51-b03)

Java HotSpot(TM) Server VM (build 1.7.51-b03, mixed mode)

Test to see that the Java compiler is installed. Type javac -version.

$javac -version

javac 1.5.0_22

javac: no source files

Usage: javac <options> <source files>

................................................

Scala Setup:

you can download Scala from http://www.scala-lang.org/downloads. I am trying with scala-2.9.0.1-installer.jar and put it in /tmp directory. Execute the following command at command prompt:

$java -jar scala-2.9.0.1-installer.jar

Welcome to the installation of scala 2.9.0.1!

The homepage is at: http://scala-lang.org/

press 1 to continue, 2 to quit, 3 to redisplay

................................................

[ Starting to unpack ]

[ Processing package: Software Package Installation (1/1) ]

[ Unpacking finished ]

[ Console installation done ]

$scala -version

Scala basic

The biggest syntactic difference between Scala and Java is that the ; line end character is optional. When we consider a Scala program it can be defined as a collection of objects that communicate via invoking each others methods.

Object - Objects have states and behaviors. Example: A cat has states - color, name as well as behaviors - eating. An object is an instance of a class.
Class - A class can be defined as a blueprint that describes the behaviors/states that object of its type support.
Methods - A method is basically a behavior. A class can contain many methods. It is in methods where the logics are written, data is manipulated and all the actions are executed.
Fields - Each object has its unique set of instant variables, which are called fields. An object's state is created by the values assigned to these fields

Wednesday, 4 January 2017

Java Coding Standards

1. Introduction

The intent of this document is to create a guide for software source code quality based on Java Standard. The guidelines appearing herein apply to anyone who creates, modifies or read software source code for Java.

This document is not a description of a complete software process. A particular group will need to develop their own methodologies and procedures for the specification, design, implementation, testing and deployment of their software systems. This document is simply a set of rules to follow during the implantation phase that will help produce a high quality result.

2. What this process will achieve

You can hope to achieve three goals by requiring a consistent source code style.

· Improving the productivity of existing programmers.
· Allow new programmers to become comfortable with existing source code in less time than would otherwise be necessary.
· Allow existing programmers to move around to different projects easily without having to adjust to the programming style in use by other groups.

Standardizing the "look and feel" of the source code will reduce the time to market, it will reduce the time spent correcting problems in the product line, and it will give engineers the flexibility to jump on and off projects without a large learning curve. This means that you will be able to spend less time on legacy products and jump into the more interesting task of designing and implementing future products.

3. Java

This Document delves into some fundamental java programming techniques and provides a rich collection of coding practices to be followed by JAVA/J2EE based application development teams for Java.

This document is written for Java software developers to help them:

· Write java code that is easy to maintain and enhance
· Increase their productivity

3.1. Standards for Classes, Interfaces, Packages, and Compilation Units

3.1.1. Standards for Packages

3.1.1.1. Naming packages

The rules associated with the naming of packages as follows:

Unless required otherwise, a package name should include the organization’s domain name, with the top level domain type in lower case ASCII letters i.e. com.<Name of company> followed by project name and sub project name as specified in ISO Standard 3166, 1981.

Eg:- com.<companyname>.<modulename>.<other>

Subsequent components of the package name vary according to requirements.

Package names should preferably be singular.

3.1.1.2. Documenting a Package

There should be one or more external documents in html format with the package name that describe the purpose of the packages documenting the rationale for the package, the list of classes and interfaces in the package with a brief description of each so that other developers know what the package contains.

3.1.2. Standards for Classes

Naming Classes:

Class names should be simple full English descriptor nouns, in mixed case starting with the first letter capitalized and the first letter of each internal word also capitalized. Whole words should be used instead of acronyms and abbreviations unless the abbreviation is more widely used than the long form, such as URL or HTML.

Class Visibility:

Package or default visibility may be used for classes internal to a component while public visibility may be used for other components. However, for a good design, the rule is to be as restrictive as possible when setting the visibility. The reason why the class is public should be documented. Each class should have an appropriate constructor.

Documenting a Class:

The documentation comments for a class start with the header for class with filename, version, copyright and related information. The documentation comments should precede the definition of a class and should contain necessary information about the purpose of the class, details of any known bugs, examples etc. as illustrated in. The development/maintenance history of the class should be entered as comments in the configuration management tool at the time of baselining the source code and in the file header as well.

3.1.3. Standards for interfaces

The Java convention is to name interfaces using mixed case with the first letter of each word capitalized like classes. The preferred convention for the name of an interface is to use a descriptive adjective, such as Runnable or Clone able. Interfaces should be documented specifying the purpose of the interface and how it should and shouldn’t be used. Method declarations in interfaces should explicitly declare the methods as public for clarity.

3.1.4. Standards for Methods

Naming Methods:

Methods should be named using a full English description, using mixed case with the first letter of any non-initial word capitalized. It is also common practice for the first word of a method name to be a strong, active verb. e.g. getValue(), printData(), save() ,delete(). This convention results in methods whose purpose can often be determined just by looking at its name. It is recommended that accessor methods be used to improve the maintainability of classes.

Getters:

Getters are methods that return the value of a field. The word ‘get’ should be prefixed to the name of the field, unless it is a boolean field where ‘is’ should be prefixed to the name of the field. e.g. getTotalSales(), isPersistent(). Alternately the prefix ‘has’ or ‘can’ instead of ‘is’ for boolean getters may be used. For example, getter names such as hasDependents() and canPrint() can be created. Getters should always be made protected, so that only subclasses can access the fields except when an ‘outside class’ needs to access the field when the getter method may be made public and the setter protected.

Setters:

Setters, also known as mutators, are methods that modify the values of a field. The word ‘set’ should be prefixed to the name of the field for such methods type. Example: setTotalSales(), setPersistent(boolean isPersistent)

Getters for constants:

Constant values may need to be changed over a period of time. Therefore constants should be implemented as getter methods. By using accessors for constants there is only one source to retrieve the value. This increases the maintainability of system.

Accessors for collections:

The main purpose of accessors is to encapsulate the access to fields. Collections, such as arrays and vectors need to have getter and setter method and as it is possible to add and remove to and from collections, accessor methods need to be included to do so. The advantage of this approach is that the collection is fully encapsulated, allowing changes later like replacing it with another structure, like a linked list.

Examples: getOrderItems(), setOrderItems(), insertOrderItem(), deleteOrderItem(), newOrderItem()

Method Visibility:

For a good design, the rule is to be as restrictive as possible when setting the visibility of a method. If a method doesn’t have to be private then it should have default access modifier, if it doesn’t have to be default then it should be made protected and if it doesn’t have to protect only then it should be made public. Wherever a method is made more visible it should be documented why.

Access modifier for methods should be explicitly mentioned in cases like interfaces where the default permissible access modifier is public.

Standards for Parameters (Arguments) to Methods :

Parameters should be named following the same conventions as for local variables. Parameters to a method are documented in the header documentation for the method using the javadoc@param tag.

However:

· Cascading method calls like method1().method2() should be avoided.
· Overloading methods on argument type should be avoided.
· It should be declared when a class or method is thread-safe.
· Synchronized methods should be preferred over synchronized blocks.
· The fact that a method invokes wait should always be documented.
· Abstract methods should be preferred in base classes over those with default implementations.
· All possible overflow or underflow conditions should be checked for a computation.

There should be no space between a method/constructor name and the parenthesis but there should be a blank space after commas in argument lists.

3.1.5. Naming Convention Standards

Naming Variables:

Use a full English descriptor for variable names to make it obvious what the field represents. Fields, that are collections, such as arrays or vectors, should be given names that are plural to indicate that they represent multiple values. Variable names should not start with an underscore _ or dollar sign $ characters and should be short and meaningful. The choice of a variable name should be mnemonic i.e., designed to indicate to the casual observer the intent of its use. Single character variable names should be avoided except for temporary “throwaway” variables.

Naming Components:

For names of components, full English descriptor should be used, post fixed by the Component type. This makes it easy to identify the purpose of the component as well as its type, making it easier to find each component in a list. Therefore names like NewHelpMenuItem, CloseButton should be preferred over Button1, Button2, etc.

Naming Constants:

Constants, whose values that do not change, are typically implemented as static final fields of classes. They should be represented with full English words, all in uppercase, with underscores between the words like FINAL_VALUE.

Naming Collections:

A collection, such as an array or a vector, should be given a pluralized name representing the types of objects stored by the array. The name should be a full English descriptor with the first letter of all non-initial words capitalized like customers, orderItems, aliases.

Naming Local Variables:

In general, local variables are named following the same conventions as used for fields, in other words use of full English descriptors with the first letter of any non-initial word in uppercase. For the sake of convenience, however, this naming convention is relaxed for several specific types of local variable like Streams, Loop counters, Exceptions. Name hiding or data hiding refers to the practice of naming a local variable, argument, or methods the same or similar as that of another one of greater scope in same or super class. This may lead to confusion and should be avoided.

Naming Streams:

When there is a single input and/or output stream being opened, used, and then closed within a method the common convention is to use ‘in’ and ‘out’ for the names of these streams, respectively.

Naming Loop Counters:

Loop counters are a very common use for local variables therefore the use of i, j, or k, is acceptable for loop counters where they are obvious. However, if these names are used for loop counters, they should be used consistently. For complex nested loops the counters should be given full meaningful English descriptors.

3.1.6. Comments on class methods

Each method should declare the javadoc tags exactly in the sequence as given below. Each line item begins with an asterisk. All subsequent lines in multiline component are to be indented so that they line up vertically with the previous line. For reference, the javadoc tags are explained in detail in Annexure.

Example:

/**

* Description:

* @param <Mandatory Tag> for description of each parameter

* @return <Mandatory Tag> except for constructor and void>

* @exception <Optional Tag>

* @see <Optional Tag>

* @since <Optional Tag>

* @deprecated <Optional Tag>

3.1.7. Best Practices:

Efficient String Concatenations

For making a long string by adding different small strings always use append method of java.lang.StringBuffer and never use ordinary ‘+’ operator for adding up strings.

Optimal use of Garbage collection

For easing out the work of java Garbage Collector always set all referenced variables used to ‘null’ explicitly thus de-referencing the object which is no more required by the application and allowing Garbage Collector to swap away that variable thus realizing memory.

Use variables in any code optimally

Try to make minimum number of variables in JSP/Java class and try to use already made variables in different algorithms shared in same JSP/Java class by setting already populated variable to ‘null’ and then again populating that variable with new value and then reusing them.

Try to reduce the number of hits to database

Number of hits to the database should be reduced to minimum by getting data in a well arranged pattern in minimum number of hits by making the best use of joins in the database query itself rather than getting dispersed data in more number of hits.

Heavy Object should not be stored in sessions

Storing heavy objects in the Session can lead to slowing of the running of the JSP page so such case should be avoided.

Always use database connection pooling

Always use javax.sql.DataSource which is obtained through a JNDI naming lookup. Avoid the overhead of acquiring a javax.sql.DataSource for each SQL access. This is an expensive operation that will severely impact the performance and scalability of the application.

Release database resources when done

Failing to close and release JDBC connections can cause other users to experience long waits for connections. Although a JDBC connection that is left unclosed will be reaped and returned by Application Server after a timeout period, others may have to wait for this to occur. Close JDBC statements when you are through with them. JDBC ResultSets can be explicitly closed as well. If not explicitly closed, ResultsSets are released when their associated statements are closed. Ensure that your code is structured to close and release JDBC resources in all cases, even in exception and error conditions.

Minimum use of System.out.println

Because it seems harmless, this commonly used application development legacy is overlooked for the performance problem it really is. Because System.out.println statements and similar constructs synchronize processing for the duration of disk I/O,they can significantly slow throughput.

Synchronization has a cost

· Putting synchronized all over the place does not ensure thread safety.
· Putting synchronized all over the place is likely to deadlock.
· Putting synchronized all over will slow your code and prevent it from running when it should. This accounts for memory leaks.

3.1.8. Technical points

Apart from the standards mentioned already following should be considered while writing java

· Instance /class variables should not be made public as far as possible.
· A constructor or method must explicitly declare all unchecked (i.e. runtime) exceptions it expects to throw. The caller can use this documentation to provide the proper arguments.
· Unchecked exceptions should not be used instead of code that checks for an exceptional condition. e.g. Comparing an index with the length of an array is faster to execute and better documented than catching arrayOutOfBoundsException.
· If Object.equals is overridden, also override Object.hashCode, and vice-versa.
· Override readObject and writeObject if a Serializable class relies on any state that could differ across processes,including,in particular,hashCodes and transient fields.
· If clone() may be called in a class, then it should be explicitly defined, and declare the class as implements Cloneable.
· Always use method equals instead of operator == when comparing objects. In particular, do not use == to compare Strings unless comparing memory locations.
· Always embed wait statements in while loops that re-wait if the condition being waited for does not hold.
· Use notifyAll instead of notify or resume when you do not know exactly the number of threads which are waiting for something
· When throwing an exception, do not refer to the name of the method which has thrown it but specify instead some explanatory text.
· Document fragile constructions that have been used solely for the sake of optimization.
· Document cases where the return value of a called method is ignored.
· Minimize * forms of import Be precise about what you are importing.
· Prefer declaring arrays as Type[] arrayName rather than Type arrayName[].
· StringBuffer should be preferred for cases involving String concatenations. Wherever required String objects should be preferably created with a new and not with the help of assignment, unless intentionally as they remain in the String pool even after reference is nullified.
· All class variables must be initialized with null at the point of declaration.
· All references to objects should be explicitly assigned ‘null’ when no more in use to make the objects available for garbage collection.
· As far as possible static or class fields should be explicitly instantiated by use of static initializers because instances of a class may sometimes not be created before a static field is accessed.
· Minimize statics (except for static final constants).
· Minimize direct internal access to instance variables inside methods.
· Declare all public methods as synchronized.
· Always document the fact that a method invokes wait.
· Classes designed should be easily extensible. This will be very important in the event that the currently designed project needs to be enhanced at a later stage.
· It is very important to have the finally clause (whenever required) because its absence can cause memory leakage and open connections in a software.

Thanks