AWS: Java S3 Upload

This entry is part 3 of 5 in the series AWS & Java

If you want to push data to AWS S3 there are a few different ways of doing this. I will show you two ways I have used.

Option 1: putObject

import com.amazonaws.AmazonClientException;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;

ClientConfiguration config = new ClientConfiguration();
config.setSocketTimeout(SOCKET_TIMEOUT);
config.setMaxErrorRetry(RETRY_COUNT);
config.setClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT);
config.setRequestTimeout(REQUEST_TIMEOUT);
config.setConnectionTimeout(CONNECTION_TIMEOUT);

AWSCredentialsProvider credProvider = ...;
String region = ...;

AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withCredentials(credProvider).withRegion(region).withClientConfiguration(config).build();

InputStream stream = ...;
String bucketName = .....;
String keyName = ...;
String mimeType = ...;

//You use metadata to describe the data.
final ObjectMetadata metaData = new ObjectMetadata();
metaData.setContentType(mimeType);

//There are overrides available. Find the one that suites what you need.
try {
	s3Client.putObject(bucketName, keyName, stream, metaData);
} catch (final AmazonClientException ex) {
	//Log the exception
}

Option 2: MultiPart Upload

import com.amazonaws.AmazonClientException;
import com.amazonaws.event.ProgressEvent;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.event.ProgressEventType;
import com.amazonaws.event.ProgressListener;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.TransferManagerBuilder;
import com.amazonaws.services.s3.transfer.Upload;

ClientConfiguration config = new ClientConfiguration();
config.setSocketTimeout(SOCKET_TIMEOUT);
config.setMaxErrorRetry(RETRY_COUNT);
config.setClientExecutionTimeout(CLIENT_EXECUTION_TIMEOUT);
config.setRequestTimeout(REQUEST_TIMEOUT);
config.setConnectionTimeout(CONNECTION_TIMEOUT);

AWSCredentialsProvider credProvider = ...;
String region = ...;

AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withCredentials(credProvider).withRegion(region).withClientConfiguration(config).build();

InputStream stream = ...;
String bucketName = .....;
String keyName = ...;
long contentLength = ...;
String mimeType = ...;

//You use metadata to describe the data. You need the content length so the multi part upload knows how big it is
final ObjectMetadata metaData = new ObjectMetadata();
metaData.setContentLength(contentLength);
metaData.setContentType(mimeType);

TransferManager tf = TransferManagerBuilder.standard().withS3Client(s3Client).build();
tf.getConfiguration().setMinimumUploadPartSize(UPLOAD_PART_SIZE);
tf.getConfiguration().setMultipartUploadThreshold(UPLOAD_THRESHOLD);
Upload xfer = tf.upload(bucketName, keyName, stream, metaData);

ProgressListener progressListener = new ProgressListener() {
	public void progressChanged(ProgressEvent progressEvent) {
		if (xfer == null)
			return;
		
		if (progressEvent.getEventType() == ProgressEventType.TRANSFER_FAILED_EVENT || progressEvent.getEventType() == ProgressEventType.TRANSFER_PART_FAILED_EVENT) {
			//Log the message
		}
	}
};

xfer.addProgressListener(progressListener);
xfer.waitForCompletion();

AWS: Java S3 Lambda Handler

This entry is part 1 of 5 in the series AWS & Java

If you want to write a Lambda for AWS in Java that connects to S3. You need to have the handler.

Maven:

<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk</artifactId>
    <version>1.11.109</version>
</dependency>

This is the method that AWS Lambda will call. It will look similar to the one below.

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.events.S3Event;
import com.amazonaws.services.s3.event.S3EventNotification.S3Entity;
import com.amazonaws.services.s3.event.S3EventNotification.S3EventNotificationRecord;

public void S3Handler(S3Event s3e, Context context) {
	final String awsRequestId =  context.getAwsRequestId();
	final int memoryLimitMb = context.getMemoryLimitInMB();
	final int remainingTimeInMillis = context.getRemainingTimeInMillis();

	for (final S3EventNotificationRecord s3Rec : s3e.getRecords()) {
		final S3Entity record = s3Rec.getS3();
		
		final String bucketName = record.getBucket().getName()
		final String key = record.getObject().getKey();
	}
}

The thing to note when you setup you Lambda is how to setup the “Handler” field in the “Configuration” section on AWS. It is in the format “##PACKAGE##.##CLASS##::##METHOD##”.

AWS: Python S3

This entry is part 3 of 3 in the series AWS & Python

If you haven’t already done so please refer to the AWS setup section which is part of this series. As time goes on I will continually update this section.

To work with S3 you need to utilise the “connection_s3” you setup already in the previous tutorial on setting up the connection.

To load a file from a S3 bucket you need to know the bucket name and the file name.

connection_s3.Object(##S3_BUCKET##, ##FILE_NAME##).load()

If you want to check if the check if a file exists on S3 you do something like the below. However you will need to import botocore.

import botocore

def keyExists(key):
    file = connection_s3.Object(##S3_BUCKET##, ##FILE_NAME##)
    
    try:
        file.load()
    except botocore.exceptions.ClientError as e:
        exists = False
    else:
        exists = True
    
    return exists, file

If you want to copy a file from one bucket to another or sub folder you can do it like below.

connection_s3.Object(##S3_DESTINATION_BUCKET##, ##FILE_NAME##).copy_from(CopySource=##S3_SOURCE_BUCKET## + '/' + ##FILE_NAME##)

If you want to delete the file you can use the “keyExists” function above and then just call “delete”.

##FILE##.delete()

If you want to just get a bucket object. Just need to specify what bucket and utilise the S3 connection.

bucket = connection_s3.Bucket(##S3_BUCKET##)

To upload a file to S3’s bucket. You need to set the body, type and name. Take a look at the below example.

bucket.put_object(Body=##DATA##,ContentType="application/zip", Key=##FILE_NAME##)

If you want to loop over the objects in a bucket. It’s pretty straight forward.

for key in bucket.objects.all():
	file_name = key.key
	response = key.get()
	data = response_get['Body'].read()

If you want to filter objects in a bucket.

for key in bucket.objects.filter(Prefix='##PREFIX##').all():
        file_name = key.key
	response = key.get()
	data = response_get['Body'].read()

AWS: Python Setup

This entry is part 1 of 3 in the series AWS & Python

When you want to work with S3 or a Kinesis Stream we first need to setup the connection. At the time of this writing I am using boto3 version 1.3.1.

Next we need to import the package.

import boto3

Next we setup the session and specify what profile we will be using.

profile = boto3.session.Session(profile_name='prod')

The profile name comes from the “credentials” file. You can set the environment variable “AWS_SHARED_CREDENTIALS_FILE” to specify what credentials file to use. You can setup the credentials file like below. You can change the “local” to anything you want. I normally use “stage”, “dev” or “prod”.

[local]
aws_access_key_id=##KEY_ID##
aws_secret_access_key=##SECRET_ACCESS_KEY##
region=##REGION##

Next we need to setup the connection to S3. To do this we will need to use the profile we created above.

connection_s3 = profile.resource('s3')

If we want to also use a Kinesis stream then we need to setup the connection. To do this we will need the profile we created above.

connection_kinesis = profile.client('kinesis')