The main purpose of this week's assignment is to practice pointers and memory management. The assignment may provide you with some insight about how modules work, and about how strings are implemented in other languages. You may need information from the lectures up to the memory chapter.
If you are still using Windows, you will be at a disadvantage for this
assignment. That's because there could be bugs in your program which you don't
find because the advanced debugging options -fsanitize=undefined
and -fsanitize=address
aren't available to you.
You are asked to write some functions which handle strings, wrapped as text objects to add pointer and memory management. The functions you are asked to write are:
text *newText(char *s); void freeText(text *t); int length(text *t); char get(text *t, int i); void set(text *t, int i, char c); text *copy(text *t); int compare(text *t1, text *t2); void append(text *t1, text *t2); text *slice(text *t, int i, int j); int find(text *t, text *p);
These functions form a potentially reusable library. The type
text
is a custom type set up for this assignment. It roughly
matches how string types are implemented in other languages. As usual, the tests
will provide the detailed specifications of the functions.
The text type is declared in text.h
and defined in
text.c
as:
struct text { int capacity; char *content; }; typedef struct text text;
Here's a couple of notes:
The structure contains a dynamically allocated character array and its capacity.
The capacity is the total size of the array, not the length of the string stored
in it. The length of the string in it is determined by a terminating null
character as usual, so strlen
can be used to find it.
The typedef
statement defines the name
text
as an abbreviation for the full type name struct
text
.
A variable declared as text *t
involves two pointers. The first
is t
itself, which points to a structure. Its purpose is to allow
the structure to be updated by the functions it is passed to, for example
set
doesn't have to return anything.
The second pointer is
t->content
which points to the allocated array. Its purpose is to allow
the array to change size, without the structure having to move so, for example,
append
doesn't have to return anything.
You are provided with three files.
Download the three files. The file
Makefile
may end up as Makefile.txt
, and you
will need to rename it as Makefile
with no extension (with
command mv Makefile.txt Makefile
).
You are being asked to write a program in almost the same way as in previous
assignments. The differences are that a header has been added, and that the test
and main
functions are included conditionally, so that the
assignment roughly mimics a real library module.
The program defines its test code between lines #ifdef
test_text
and #endif
. That's called conditional
compilation.
The idea is that the variable test_text
is defined when testing the module, and then
the test and
main
functions are included. If the module were used in a
bigger program, the variable would not be defined, and the test and
main
functions would
be left out, to avoid unnecessary bulk and to avoid a name clash with the
program's main
function.
The Makefile has been upgraded to support module testing.
The compile command now includes the option -Dtest_text
, which causes
test_text
to be defined as if with #define
. You need
to use this option, even if you don't use the Makefile.
The header file text.h
allows other programs to use the
text module. It is not important for this assignment, except that the
text.c
file includes it, to provide a
cross-check that the declarations in the header file match the definitions. So,
you have to keep it in the same directory as text.c
while you are
working on the assignment, but you shouldn't change it.
The header is included with the line #include "text.h"
, which
has double quotes instead of angle brackets to indicate that the header is
included from the current directory, rather than the system header
directory.
You will notice that most of the comments are in text.h
rather
than text.c
. That's because a programmer using the module would
be expected to read the header for documentation, and to have little interest
in the implementation details. The header forms the
API
for the module.
The declaration and the typedef are in text.h
, and the
definition is in text.c
.
In the header file, the text
structure is declared, but not
defined. That means a program that uses the text module cannot 'see inside'
the text structure, so cannot access its fields, or even find out how
big it is to allocate memory for it. The program can only call
the official functions provided in the header file.
This is called encapsulation. It makes sure that programs do not depend on any details of how the text module is implemented, allowing the text module to be developed independently, and changed without having to re-write any programs that use it. It also allows the module to provide safety guarantees.
Since a program can only use the functions provided, these have to cover
everything. In particular, get
has to be provided as a
replacement for s[i]
, because the characters in the text
cannot be accessed directly. Similarly, set
is provided
to update the i'th character.
The get
and set
functions can be made just as
efficient as direct array indexing. You aren't required to do that, but the
best way would be to declare them as extern inline
and add
the -flto
compiler option to do
cross-module optimization.
Develop the module, function by function and test by test, as usual, noting:
It is normal to allocate a small amount of memory initially, and then multiply by some factor where necessary.
You are asked to allocate 24 bytes initially, and to multiply by 2 where necessary, so that the tests can check the capacity explicitly.
Why 24? Because experiments show that the smallest memory block allocated by
malloc
has room for 24 bytes, so asking for less than that gains
nothing. Why 2? Because it is the simplest choice, though something smaller such
as 1.5 can sometimes work slightly better.
When implementing newText
, be careful in case the initialization
string needs more than 24 bytes, not forgetting to include the null marker in
the count.
Also note that newText
needs two calls to malloc
.
When you are implementing, you can use standard string
functions. For example, in the definition of length
you can
use strlen
. This assignment is not about rewriting the
standard functions, it is about encapsulation and memory management.
Your functions should ensure that the string in a text
object
always has a null character at the end, and
always fits in its array. Of course, such guarantees depend
on all of the functions you provide being implemented completely correctly.
Ideally, the get
and set
functions should check
that the index passed is within range. You can do that with an assert.
(Asserts can be switched off for production compiling, so that they have no
effect on final efficiency.)
Even with the advanced debugging options, it is impossible to test dynamic allocation perfectly.
For example, some of the tests check the value of t->capacity
,
but they can't test that the value matches the amount you actually allocated in
your call to malloc
. And freeText
can't be tested at
all, other than via the -fsanitize=address
option.
As another example, it is impossible to test that the get
and
set
functions check that their indexes are in range.
Develop your own extra program as usual, for the second half of the marks. Ideally, you should use the same development technique. Design ultra-short functions, with logic separated from input output. You will have to develop your own tests for the logic functions. It is normal to do the tests for a function at the same time as the function itself. But don't leave the testing until afterwards. You don't need as many tests as in assignment skeletons, because it is only your confidence in correctness that is at stake.
Submit text.c
and your extra program, if any, along with a brief
readme file. You don't need to submit text.h
or
Makefile
.