Here we’ll see how to find out the number of occurrences of a substring in a string. C string library (<string.h>) provides a function (strstr()) to check the presence of a substring. But it does not give you the number of instances of the substring inside a string. We’ll implement our own function with and without using the strstr() function.
The Program
/*test.c*/
#include <stdio.h>
#include <string.h>
#define MAX_STRING_SIZE 1024
int substring_count(char* string, char* substring) {
int i, j, l1, l2;
int count = 0;
int found = 0;
l1 = strlen(string);
l2 = strlen(substring);
for(i = 0; i < l1 - l2 + 1; i++) {
found = 1;
for(j = 0; j < l2; j++) {
if(string[i+j] != substring[j]) {
found = 0;
break;
}
}
if(found) {
count++;
i = i + l2 -1;
}
}
return count;
}
int main(){
char string[MAX_STRING_SIZE];
char substring[MAX_STRING_SIZE];
int count = 0;
printf("Enter a string: ");
gets(string);
printf("Enter a substring: ");
gets(substring);
count = substring_count(string, substring);
printf("Substring occurrence count is: %d.\n", count);
return 0;
}
This program takes two strings, the string and the sub-string, as input. Then it calls the substring_count() function to count the occurrences of the substring. It iterates through the string character by character and assumes that every character can be the start of the substring. It starts matching with the substring characters from there. If any mismatch is found, the inner loop breaks and the outer loop advances to the next character to repeat the same process. If all the characters match in the inner loop, then one match is found – the counter is incremented. The outer loop advanced by the length of the substring.
Here is the output.
Using strstr() Function
The substring_count() function is changed like this.
int substring_count(char* string, char* substring) {
int i, j, l1, l2;
int count = 0;
l1 = strlen(string);
l2 = strlen(substring);
for(i = 0; i < l1 - l2 + 1; i++) {
if(strstr(string + i, substring) == string + i) {
count++;
i = i + l2 -1;
}
}
return count;
}
Instead of having our own inner loop to check whether the substring matches from a particular of the outer loop, we used the strstr() function. The strstr() function returns the pointer of the first matched character. If the pointer is equal to the pointer of starting character of the outer loop, it increments the counter.
We can achieve the same thing using strcmp() function also.
In the loop, i < l1 – l2 should actually be i < l1 – l2 +1. In the current setup, the last character is not counted. Try e.g. with something like "aidshjkwakhka"; it will not return 2 instead of 3.
You are absolutely right. Thanks for pointing that out. Corrected.