Friday 21 April 2017

Amazon EC2 Linux instance with desktop functionality from Windows

Amazon EC2 Linux instance with desktop functionality from Windows

This article describes how to connect to an Amazon EC2 Linux instance desktop by using Windows Remote Desktop.

For purposes of this article, make sure that you are running an instance of Ubuntu 14.04 LTS. In addition, this article assumes the user name is 'ubuntu'.

  1. Connect to your Linux instance as described in Connecting to Your Linux Instance from Windows Using PuTTy.
  2. Run the following commands from the terminal to install updates, an upgrade, and install additional packages.
    • sudo apt-get update
    • sudo apt-get upgrade
  3. Because you will be connecting from Windows Remote Desktop, edit the sshd_config file on your Linux instance to allow password authentication.
    • sudo vim /etc/ssh/sshd_config
  4. Change PasswordAuthentication to yes from no, then save and exit.
  5. Restart the SSH daemon to make this change take effect.
    • sudo /etc/init.d/ssh restart
  6. Temporarily gain root privileges and change the password for the ubuntu user to a complex password to enhance security. Press the Enter key after typing the command passwd ubuntu, and you will be prompted to enter the new password twice.
    • sudo –i
    • passwd ubuntu
  7. Switch back to the ubuntu user account and cd to the ubuntu home directory.
    • su ubuntu
    • cd
  8. Install Ubuntu desktop functionality on your Linux instance.
    • export DEBIAN_FRONTEND=noninteractive
    • sudo -E apt-get update
    • sudo -E apt-get install -y ubuntu-desktop
  9. Install XRDP and other xfce4 resources.
    • sudo apt-get install xfce4 xrdp
    • sudo apt-get install xfce4 xfce4-goodies
  10. Make xfce4 the default window manager for RDP connections.
    • echo xfce4-session > ~/.xsession
  11. Copy .xsession to the /etc/skel folder so that xfce4 is set as the default window manager for any new user accounts that are created.
    • sudo cp /home/ubuntu/.xsession /etc/skel
  12. Open the xrdp.ini file to allow changing of the host port you will connect to.
    • sudo vim /etc/xrdp/xrdp.ini
  13. Look for the section [xrdp1] and change the following text (then save and exit [:wq]).
    • port= -1 - to -   port= ask-1
  14. Restart xrdp.
    • sudo service xrdp restart
On Windows, open the Remote Desktop Connection client, paste the fully qualified name of your Amazon EC2 instance for the Computer, and then click Connect.

When prompted to Login to xrdp, ensure that the sesman-Xvnc module is selected, and enter the username ubuntu with the new password that you created in step 7. When you start a session, the port number is -1.

When the system connects, several status messages are displayed on the Connection Log screen. Pay close attention to these status messages and make note of the VNC port number displayed. If you want to return to a session later, specify this number in the port field of the xrdp login dialog box.


Tuesday 11 April 2017

Introducing Apache Beam

Introducing Apache Beam

As part of the Google Cloud ecosystem, Google created Dataflow SDK. Now, as a Google, Talend, Cask, data Artisans, PayPal, and Cloudera join effort, Apache Dataflow is at Apache Incubator.

Architecture and Programming model

Just Imagine, you have a Hadoop cluster where you used MapReduce jobs. Now, you want to “migrate” these jobs to Spark: you have to refactor all your jobs which requires lot of works and cost. And after that, see the effort and cost if you want to change for a new platform like Flink: you have to refactor your jobs again.

Dataflow aims to provide an abstraction layer between your code and the execution run-time.
The SDK allows you to use an unified programming model: you implement your data processing logic using the Dataflow SDK, the same code will run on different back ends(Spark / Flink / etc). You don’t need to refactor and change the code anymore !

If your target back end is not yet supported by Dataflow, you can implement your own runner for this back end, again the code using Dataflow SDK doesn’t change.

Dataflow is able to deal with batch processing jobs, but also with streaming jobs.

Pipelines, translators and runners

Using this SDK, your jobs are actually designed as pipeline. A pipeline is a chain of processes on the data.

It’s basically the only part that you have to write.

Dataflow reads the pipelines definition, and translate them for a target runner like SPARK/FLINK. A translator is responsible of adapting the pipeline code depending of the runner. For instance, the MapReduce translator will transform pipelines as MapReduce jobs, the Spark translator will transform pipelines as Spark jobs, etc.

The runners are the “execution” layer. Once a pipeline has been “translated” by a translator, it can be run directly on a runner. The runner “hides” the actual back end: MapReduce/Yarn cluster, Spark cluster (running on Yarn or Mesos), etc.

If Dataflow comes with ready to use translators and runners, you can create your own ones.
For instance, you can implement your own runner by creating a class extending PipelineRunner. You will have to implement different runner behaviors (like the transform evaluates, supported options, apply main transform hook method, etc).

SDK:

The SDK is composed by four parts:

Pipelines are the streaming and processing logic that you want to implement. It’s a chain of processes. Basically, in a pipeline, you read data from a source, you apply transformations on the data, and eventually send the data to a destination (named sink in Dataflow wording).

PCollection is the object transported inside a pipeline. It’s the container of the data, flowing between each step of the pipeline.

Transform is a step of a pipeline. It takes an incoming PCollection and creates an out coming PCollection. You can implement your own transform function.

Sink and Source are used to retrieve data as input (first step) of a pipeline, and eventually send the data outside of the pipeline.

Scala Overview

Scala Overview

Scala, short for Scalable language, is a functional programming language. Scala integrates features of object-oriented and functional languages and it is compiled to run on the Java Virtual Machine.

Scala is object oriented

Scala is a pure object-oriented language in the sense that every value is an object. Types and behavior of objects are described by classes which will be explained in subsequent chapters.
Classes are extended by subclass and a flexible composition mechanism as a clean replacement for multiple inheritance.

Scala runes on JVM

Scala is compiled into Java Byte Code, which is executed by the Java Virtual Machine (JVM). This means that Scala and Java have a common run-time platform. You can easily move from Java to Scala. The Scala compiler compiles your Scala code into Java Byte Code, which can then be executed by the scala command. The scala command is similar to the java command.

Scala is functional

Scala is also a functional language in the sense that every function is a value and because every value is an object so ultimately every function is an object.
Scala provides a lightweight syntax for defining anonymous functions, it supports higher-order functions, it allows functions to be nested, and supports currying.

Scala can execute java code

Scala enables you to use all the classes of the Java SDK's in Scala, and also your own, custom Java classes, or your favourite Java open source projects.

Java is statically typed

Scala, unlike some of the other statically typed languages, does not expect you to provide redundant type information. You don't have to specify a type in most cases, and you certainly don't have to repeat it.

Scala features which is differ from Java

Scala has a set of features, which differ from Java. Some of these are: 
  • All types are objects.
  • Type inference.
  • Nested Functions.
  • Functions are objects.
  • Domain specific language (DSL) support.
  • Traits.
  • Closures.
  • Concurrency support inspired by Erlang.
Scala is being used everywhere and importantly in enterprise web applications. You can check few of the most popular Scala web frameworks:
  • The Lift Framework.
  • The Play framework.
  • The Bowler framework.

Environment Setup

The Scala language can be installed on any UNIX-like or Windows system. Before you start installing Scala on your machine, you must make sure that you have Java 1.5 or greater installed on your computer.

Installing Scala on Windows


Java Setup:

First, you must set the JAVA_HOME environment variable and add the JDK's bin directory to your PATH variable. To verify if everything is fine, at command prompt, do following.

C:\>java -version 
java version "1.7.51" 
Java(TM) SE Runtime Environment (build 1.7.51-b51) 
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b05, mixed mode) 
C:\>

Test to see that the Java compiler is installed. Type javac -version. You should see something like the following:

C:\>javac -version javac 1.7.51 
C:\> 

Scala Setup:

you can download Scala from http://www.scala-lang.org/downloads. I am trying with scala-2.9.0.1-installer.jar and put it in C:/> directory. Execute the following command at command prompt:

C:\>java -jar scala-2.9.0.1-installer.jar 
C:\>

This will display an installation wizard, which will guide you to install scala on your windows machine. During installation, it will ask for license agreement, simply accept it and further it will ask a path where scala will be installed.

C:\>scala -version 
Scala code runner version 2.9.0.1 -- Copyright 2002-2011, LAMP/EPFL 
C:\>

Installing Scala on Linux


Java Setup:

Make sure you have Java JDK 1.7 or greater installed on your computer and set JAVA_HOME environment variable and add the JDK's bin directory to your PATH variable. To verify if everything is fine, at command prompt, type java -version and press Enter. You should see something like the following:

$java -version 
java version "1.7.51" 
Java(TM) 2 Runtime Environment, Standard Edition (build 1.7.51-b03) 
Java HotSpot(TM) Server VM (build 1.7.51-b03, mixed mode) 
$

Test to see that the Java compiler is installed. Type javac -version.

$javac -version 
javac 1.5.0_22 
javac: no source files 
Usage: javac <options> <source files> 
................................................ 
$

Scala Setup:

you can download Scala from http://www.scala-lang.org/downloads. I am trying with scala-2.9.0.1-installer.jar and put it in /tmp directory. Execute the following command at command prompt:

$java -jar scala-2.9.0.1-installer.jar 
Welcome to the installation of scala 2.9.0.1! 
The homepage is at: http://scala-lang.org/ 
press 1 to continue, 2 to quit, 3 to redisplay 
................................................ 
[ Starting to unpack ] 
[ Processing package: Software Package Installation (1/1) ] 
[ Unpacking finished ] 
[ Console installation done ] 
$

This will display an installation wizard, which will guide you to install scala on your windows machine. During installation, it will ask for license agreement, simply accept it and further it will ask a path where scala will be installed.

$scala -version 
Scala code runner version 2.9.0.1 -- Copyright 2002-2011, LAMP/EPFL 
$

Scala basic

The biggest syntactic difference between Scala and Java is that the ; line end character is optional. When we consider a Scala program it can be defined as a collection of objects that communicate via invoking each others methods.

  • Object - Objects have states and behaviors. Example: A cat has states - color, name as well as behaviors - eating. An object is an instance of a class.
  • Class - A class can be defined as a blueprint that describes the behaviors/states that object of its type support.
  • Methods - A method is basically a behavior. A class can contain many methods. It is in methods where the logics are written, data is manipulated and all the actions are executed.
  • Fields - Each object has its unique set of instant variables, which are called fields. An object's state is created by the values assigned to these fields