Coursework: Week 4, Strings

The assignment this week does not count towards your credit for the unit. The main purpose is to learn to write functions which involve arrays, strings and loops. You may need information from the lecture notes up to the strings chapter.

understand the task

You are asked to write your own versions of some of the string functions in the standard library. Each function should be implemented directly, without calling any library functions. (The functions can call each other.) The declarations of the functions you are asked to write are:

int length(const char s[]);
void copy(char t[], const char s[]);
int compare(const char s[], const char t[]);
void append(char t[], const char s[]);
int find(const char t[], const char s[]);

These are intended to be custom versions of strlen, strcpy, strcmp, strcat and strstr.The tests will provide the detailed specifications of the functions.

study the skeleton program

You are given this skeleton program and Makefile as a starting point:

strings.c
Makefile

Here are some notes:

Compile and run as usual

Type:

make
./strings
There is no pointer notation

Array notation is used rather than pointer notation. You don't need to use any pointer notation in your answers. (It is traditional to use pointer notation in functions like this, but array notation is arguably superior, because it is more readable and can actually be more efficient nowadays, because the compiler with the -O2 option has more information about what you mean so it can optimise better.)

The program only does testing

The program doesn't do anything, except run tests on the functions. This is what you would do if you were developing a library module. The only additions for a real library module would be (a) you would add a header strings.h, like the standard one, for users to include and (b) you would rename main to avoid a clash with a program that used your library. You will see that in detail in later assignments.

The tests form a specification

Read the tests to find the detailed specification of what each function has to achieve. Note that, for the compare function, you only need to return a negative, zero or positive number, not necessarily -1, 0, +1. Historically, that's for efficiency reasons because you would have to add an extra unnecessary step to normalise. A couple of the original functions unnecessarily return the result string - the custom versions don't, because it is not possible using only array notation. The find function returns an int position rather than a substring for the same reason.

The types are simplifed

For example, the return type of length is int, rather than size_t. That means it won't work for super-long strings. However, for a program which handles such strings, you would probably want to design a custom string type anyway.

The const keyword is used

The const keyword is optional. However, it has advantages. For one thing, it helps you to implement the functions, because the compiler will detect any mistaken attempt to change a string which you are not supposed to change. Also, it helps the compiler to optimise the code.

develop the program

As usual, it is a case of working in small steps, one test at a time, keeping your program in a working state. Here are some things to think about:

Don't call library functions

The assignment is about writing loops. Obviously, calling strlen to make length easy isn't allowed. As well as the tests in the skeleton, some further tests will be made on your program while marking to try to ensure that you haven't used any library functions.

Efficiency

The efficiency of your functions won't be tested in the marking. But efficiency is tradionally a big concern in writing reusable library functions. Don't agonize over efficiency, especially if it is at the expense of readability or correctness. I suggest you just give a little thought to how many times each function loops through the relevant characters. (If you are fascinated by efficiency, you could do timings or look at the compiled code in assembly language format, but that is definitely not required.)

Trust the compiler

Suppose you write a loop for (int i=0; i<length(s); ...).... You might think that the function length is called each time round the loop. Since length itself involves a loop, that would be inefficient. But if s is declared const and you are using the -O2 option to compile, the compiler will notice that length(s) only needs to be called once.

Use the null terminator

If you write for (int i=0; i<length(s); ...)... then there are two loops through the characters, one in the length function, and the one you've just written. You may be able to get away with one loop by writing for (int i=0; s[i] != '\0'; ...)....

Pre-declare the loop variable

If you write for (int i=0;...)... there may be one more thing you need to do, at the end of the string. But the variable i is no longer available, because it is local to the loop. If instead you write int i; ... for (i=0;...)... then i is available after the loop ends, and may have a useful value in it.

write something extra

If you have time, and if you are interested, you can write another C program. Write any program you want, of the same sort of scope as the string program, involving arrays and loops.

Submit your source file, plus a readme.txt to explain what you have done, or if you want to catch my attention.

submit

The tests in the string function count for 50% of the assignment. The filename must be strings.c precisely. Marks will be taken off if the filename is wrong, or library function calls are detected. The other 50% comes from a quick look at any extension program you submit.