The Performance Lab Solution

Description

5/5 – (2 votes)

Objectives

This assignment deals with optimizing memory intensive code. Image processing and matrix operations offer many examples of functions that can benefit from optimization. In this homework, we will consider Exposure Fusion and Gaussian Blur. Your objective in this homework is to optimize these functions as much as possible by using the concepts you have learned in class.

Exposure fusion merges multiple exposures into a single image. During the process, the pixels of each image are assigned weights according to their luminosity, saturation and contrast. However, you are not required to calculate the weights in this homework. The weight matrices will be provided to you. The simplified formula can be seen below:

−1

^Rij ⁼ ^Wk,ij ^∗ ^Ik,ij

k=0

where W_k is the kth weight matrix, I_k is the kth image matrix and R is the resulting matrix. For this homework, we will use 4 (differently exposed) images, so N = 4 for the above formula.

The second function, Gaussian blur, is used to reduce the noise in the images. Gaussian blur is also used when calculating weight matrices for exposure fusion. It can be applied by convolving the image with a Gaussian filter. For this homework we will use 5 × 5 Gaussian filter with a standard deviation of 1.

Convolution is done by calculating the sum of element-wise multiplication of the image patch and the filter, for each patch. The first two steps while convolving two matrices are represented in Figure 1.

Test case		1	2	3	4	5

Method	N	64	128	256	512	1024	Geom. Mean
Naive fusion (CPE)		7.4	7.4	7.3	7.9	9.5
Optimized fusion (CPE)		6.5	6.8	6.8	7.5	9.1
Speedup (naive/opt)		1.1	1.1	1.1	1.1	1.0	1.1


Method	N	64	128	256	512	1024	Geom. Mean
Naive blur (CPE)		43.3	46.0	53.3	67.0	69.4
Optimized blur (CPE)		22.9	24.3	31.6	31.8	33.2
Speedup (naive/opt)		1.9	1.9	1.7	2.1	2.1	1.9

Table 1: CPEs and Ratios for Optimized vs. Naive Implementations

}

As you can see from Figure 1, the resultant image is smaller than the original one. For simplicity, you should be filling the destination matrix dst starting from top left pixel and leave the rest untouched. dst is a zero matrix, initially.

Performance Measures

Our main performance measure is CPE or Cycles per Element. If a function takes C cycles to run for an image of size N × N , the CPE value is C/N². Table 1 summarizes the performance of the naive implementations shown above and compares it against an optimized implementation. Performance is shown for for 5 different values of N. All measurements were made on the the department computers (ineks).

The ratios (speed-ups) of the optimized implementation over the naive one will constitute a score of your implemen-tation. To summarize the overall effect over different values of N, we will compute the geometric mean of the results for these 5 values. That is, if the measured speed-ups for N = {64, 128, 256, 512, 1024} are R₆₄, R₁₂₈, R₂₅₆, R₅₁₂, and R₁₀₂₄ then we compute the overall performance as

R = ⁵ R₆₄ × R₁₂₈ × R₂₅₆ × R₅₁₂ × R₁₀₂₄

Assumptions

To make life easier, you can assume that N is a multiple of 32. Your code must run correctly for all such values of N, but we will measure its performance only for the 5 values shown in Table 1. For both functions, we will use square grayscale images.

Infrastructure

We have provided support code to help you test the correctness of your implementations and measure their perfor-mance. This section describes how to use this infrastructure. The exact details of each part of the assignment is described in the following section.

Note: The only source file you will be modifying is kernels.c.

Versioning

You will be writing many versions of the fusion and blur routines. To help you compare the performance of all the different versions you’ve written, we provide a way of “registering” functions.

For example, the file kernels.c that we have provided you contains the following functions:

void register_fusion_functions(){

add_fusion_function(&fusion, fusion_descr);

}

void register_blur_functions(){

add_blur_function(&blur, blur_descr);

}

This function contains one or more calls to add fusion function and add blur function. In one of the examples above, add fusion function registers the function fusion along with a string fusion descr which is an ASCII description of what the function does. See the file kernels.c to see how to create the string descriptions.

This string can be at most 256 characters long. Gaussian blur works the same way.

Driver

The source code you will write will be linked with object code that we supply into a driver binary. To create this binary, you will need to execute the command

unix> make driver

You will need to re-make driver each time you change the code in kernels.c. To test your implementations, you can then run the command:

unix> ./driver

The driver can be run in four different modes:

Default mode, in which all versions of your implementation are run.

Autograder mode, in which only the fusion() and blur() functions are run. This is the mode we will use when grading your solution.

File mode, in which only versions that are mentioned in an input file are run.

Dump mode, in which a one-line description of each version is dumped to a text file. You can then edit this text file to keep only those versions that you’d like to test using the file mode. You can specify whether to quit after dumping the file or if your implementations are to be run.

If run without any arguments, driver will run all of your versions (default mode). Other modes and options can be specified by command-line arguments to driver, as listed below:

-g : Run only fusion() and blur() functions (autograder mode).

-f <funcfile> : Execute only those versions specified in <funcfile> (file mode).

-d <dumpfile> : Dump the names of all versions to a dump file called <dumpfile>, one line to a version (dump mode).

-q : Quit after dumping version names to a dump file. To be used in tandem with -d. For example, to quit immediately after printing the dump file, type ./driver -qd <dumpfile>. -h : Print the command line usage.

Team Information

Important: Before you start, you should fill in the struct in kernels.c with information about your team (team name, student names, student ids).

Assignment Details

Optimizing Exposure Fusion (30 points)

In this part, you will optimize fusion to achieve as low a CPE as possible. You should compile driver and then run it with the appropriate arguments to test your implementations.

For example, running driver with the supplied naive version (for fusion) generates the output shown below:

unix> ./driver

Teamname: Team

Member 1: Student Name

ID 1: eXXXXXXX

fusion: Version = naive_fusion: Naive				baseline exposure fusion:
Dim	64	128	256	512	1024	Mean
Your CPEs	7.4	7.5	7.4	8.1	9.7
Baseline CPEs	7.4	7.4	7.3	7.9	9.5
Speedup	1.0	1.0	1.0	1.0	1.0	1.0

Optimizing Gaussian Blur (70 points)

In this part, you will optimize blur to achieve as low a CPE as possible. You should compile driver and then run it with the appropriate arguments to test your implementations.

For example, running driver with the supplied naive version (for blur) generates the output shown below:

unix> ./driver

Teamname: Team

Member 1: Student Name

ID 1: eXXXXXXX

blur: Version = naive_blur The naive baseline version of Gaussian blur:

Dim	64	128	256	512	1024	Mean
Your CPEs	43.2	46.0	54.0	67.0	69.9
Baseline CPEs	43.3	46.0	53.3	67.0	69.4
Speedup	1.0	1.0	1.0	1.0	1.0	1.0

Some advice. Look at the assembly code generated for the fusion and blur. Focus on optimizing the inner loop (the code that gets repeatedly executed in a loop) using the optimization tricks covered in class.

Coding Rules

You may write any code you want, as long as it satisfies the following:

It must be in ANSI C. You may not use any embedded assembly language statements.

It must not interfere with the time measurement mechanism. You will also be penalized if your code prints any extraneous information.

You can only modify code in kernels.c. You are allowed to define macros, additional global variables, and other procedures in this file.

Evaluation

Correctness: You will get NO CREDIT for buggy code that causes the driver to complain! This includes code that correctly operates on the test sizes, but incorrectly on image matrices of other sizes. As mentioned earlier, you may assume that the image dimension is a multiple of 32.

Note that you should only modify dst pointer. If you modify input pointers or exceed their limits, you will not get any credit.

Exposure Fusion: You will get 30 points if your implementation is correct and achieve a mean speed-up of 1.1. There is no scaling in this function.

Gaussian Blur: You will get 40 points for implementation of blur if it is correct and achieve mean speed-up threshold of 1.9. The team that achieves the highest speed-up will get 70 points. Other grades will be scaled between 40 and 70 according to your speed-up. You will not get any partial credit for a correct implementation that does below the threshold.

NOTE: In order to have an idea of what will be your grade, you should know about speed-ups of other students. Therefore, sharing your highest speed-ups is highly recommended.

Since there might be changes of performance regarding to CPU status, test the same code many times and take only the best into consideration. When your codes are evaluated, your codes will be tested in a closed environment many times and only your best speed-up of functions will be taken into account.

Optional: In this assignment, you have an option to create a team up to three people. However, you can also do it alone.

Submission

Submissions will be done via ODTUClass. You can only submit kernels.c function. Therefore, make sure all of your changes are only on this file. Any member of the team can submit the file. Do NOT submit the same file by different members of the team. One submission is enough.

Share this:

Share this:

Description

Share this:

Related products

ASSIGNMENT 03

Homework 7 Solution

Assignment-(H) Solution

Simulation project Task 1- Hand simulation Solution

Problem Set 4 Solution