Coursework: Week 7, Text

The main purpose of this week's assignment is to practice pointers and memory management. The assignment may provide you with some insight about how modules work, and about how strings are implemented in other languages. You may need information from the lectures up to the memory chapter.

If you are still using Windows, you will be at a disadvantage for this assignment. That's because there could be bugs in your program which you don't find because the advanced debugging options -fsanitize=undefined and -fsanitize=address aren't available to you.

understand the task

You are asked to write some functions which handle strings, wrapped as text objects to add pointer and memory management. The functions you are asked to write are:

text *newText(char *s);
void freeText(text *t);
int length(text *t);
char get(text *t, int i);
void set(text *t, int i, char c);
text *copy(text *t);
int compare(text *t1, text *t2);
void append(text *t1, text *t2);
text *slice(text *t, int i, int j);
int find(text *t, text *p);

These functions form a potentially reusable library. The type text is a custom type set up for this assignment. It roughly matches how string types are implemented in other languages. As usual, the tests will provide the detailed specifications of the functions.

understand the text type

The text type is declared in text.h and defined in text.c as:

struct text {
    int capacity;
    char *content;
};
typedef struct text text;

Here's a couple of notes:

It defines a dynamic array

The structure contains a dynamically allocated character array and its capacity. The capacity is the total size of the array, not the length of the string stored in it. The length of the string in it is determined by a terminating null character as usual, so strlen can be used to find it.

The typedef statement defines the name text as an abbreviation for the full type name struct text.

There are two pointers involved

A variable declared as text *t involves two pointers. The first is t itself, which points to a structure. Its purpose is to allow the structure to be updated by the functions it is passed to, for example set doesn't have to return anything.

The second pointer is t->content which points to the allocated array. Its purpose is to allow the array to change size, without the structure having to move so, for example, append doesn't have to return anything.

study the skeleton

You are provided with three files.

text.h
text.c
Makefile

Download the files carefully

Download the three files. The file Makefile may end up as Makefile.txt, and you will need to rename it as Makefile with no extension (with command mv Makefile.txt Makefile).

The functions form a reasonably realistic library module

You are being asked to write a program in almost the same way as in previous assignments. The differences are that a header has been added, and that the test and main functions are included conditionally, so that the assignment roughly mimics a real library module.

Conditional compilation is used

The program defines its test code between lines #ifdef test_text and #endif. That's called conditional compilation.

The idea is that the variable test_text is defined when testing the module, and then the test and main functions are included. If the module were used in a bigger program, the variable would not be defined, and the test and main functions would be left out, to avoid unnecessary bulk and to avoid a name clash with the program's main function.

The Makefile has been upgraded to support module testing. The compile command now includes the option -Dtest_text, which causes test_text to be defined as if with #define. You need to use this option, even if you don't use the Makefile.

Keep the header

The header file text.h allows other programs to use the text module. It is not important for this assignment, except that the text.c file includes it, to provide a cross-check that the declarations in the header file match the definitions. So, you have to keep it in the same directory as text.c while you are working on the assignment, but you shouldn't change it.

The header is included with the line #include "text.h", which has double quotes instead of angle brackets to indicate that the header is included from the current directory, rather than the system header directory.

You will notice that most of the comments are in text.h rather than text.c. That's because a programmer using the module would be expected to read the header for documentation, and to have little interest in the implementation details. The header forms the API for the module.

The structure definition is split

The declaration and the typedef are in text.h, and the definition is in text.c.

Text objects are opaque

In the header file, the text structure is declared, but not defined. That means a program that uses the text module cannot 'see inside' the text structure, so cannot access its fields, or even find out how big it is to allocate memory for it. The program can only call the official functions provided in the header file.

This is called encapsulation. It makes sure that programs do not depend on any details of how the text module is implemented, allowing the text module to be developed independently, and changed without having to re-write any programs that use it. It also allows the module to provide safety guarantees.

Access functions are provided

Since a program can only use the functions provided, these have to cover everything. In particular, get has to be provided as a replacement for s[i], because the characters in the text cannot be accessed directly. Similarly, set is provided to update the i'th character.

The get and set functions can be made just as efficient as direct array indexing. You aren't required to do that, but the best way would be to declare them as extern inline and add the -flto compiler option to do cross-module optimization.

develop the module

Develop the module, function by function and test by test, as usual, noting:

The allocation policy is fixed

It is normal to allocate a small amount of memory initially, and then multiply by some factor where necessary.

You are asked to allocate 24 bytes initially, and to multiply by 2 where necessary, so that the tests can check the capacity explicitly.

Why 24? Because experiments show that the smallest memory block allocated by malloc has room for 24 bytes, so asking for less than that gains nothing. Why 2? Because it is the simplest choice, though something smaller such as 1.5 can sometimes work slightly better.

When implementing newText, be careful in case the initialization string needs more than 24 bytes, not forgetting to include the null marker in the count.

Also note that newText needs two calls to malloc.

You can use standard string functions

When you are implementing, you can use standard string functions. For example, in the definition of length you can use strlen. This assignment is not about rewriting the standard functions, it is about encapsulation and memory management.

Focus on safety

Your functions should ensure that the string in a text object always has a null character at the end, and always fits in its array. Of course, such guarantees depend on all of the functions you provide being implemented completely correctly.

Ideally, the get and set functions should check that the index passed is within range. You can do that with an assert. (Asserts can be switched off for production compiling, so that they have no effect on final efficiency.)

Testing is limited

Even with the advanced debugging options, it is impossible to test dynamic allocation perfectly.

For example, some of the tests check the value of t->capacity, but they can't test that the value matches the amount you actually allocated in your call to malloc. And freeText can't be tested at all, other than via the -fsanitize=address option.

As another example, it is impossible to test that the get and set functions check that their indexes are in range.

write something extra

Develop your own extra program as usual, for the second half of the marks. Ideally, you should use the same development technique. Design ultra-short functions, with logic separated from input output. You will have to develop your own tests for the logic functions. It is normal to do the tests for a function at the same time as the function itself. But don't leave the testing until afterwards. You don't need as many tests as in assignment skeletons, because it is only your confidence in correctness that is at stake.

submit

Submit text.c and your extra program, if any, along with a brief readme file. You don't need to submit text.h or Makefile.