To get a quick summary of a file like total number of characters, words and limes, Linux already has a tool, wc. Here we’ll see how to write C program to get the similar information.
Strategy to Count Characters, Words, Lines in a File
- Take input of a file name and open that file in a read only mode. Don’t continue if the file can’t be opened.
- Traverse the file character by character until you get the EOF character. Every file ends with the EOF character.
- Increment the character count.
- If the character is not a white-space character, set a flag in_word to 1.
- If the character is a white-space and the in_word flag is 1, increment the word count and set the in_word flag to 0.
- If the character is either ‘\n’ or ‘\0’, increment the line count.
The Program
/*test.c*/
#include <stdio.h>
#define MAX_LEN 1024
int main() {
/*Read the file.*/
char ch;
int char_count = 0, word_count = 0, line_count = 0;
int in_word = 0;
char file_name[MAX_LEN];
FILE *fp;
printf("Enter a file name: ");
scanf("%s", file_name);
fp = fopen(file_name, "r");
if(fp == NULL) {
printf("Could not open the file %s\n", file_name);
return 1;
}
while ((ch = fgetc(fp)) != EOF) {
char_count++;
if(ch == ' ' || ch == '\t' || ch == '\0' || ch == '\n') {
if (in_word) {
in_word = 0;
word_count++;
}
if(ch = '\0' || ch == '\n') line_count++;
} else {
in_word = 1;
}
}
printf("In the file %s:\n", file_name);
printf("Number of characters: %d.\n", char_count);
printf("Number of words: %d.\n", word_count);
printf("Number of lines: %d.\n", line_count);
return 0;
}
Here is the content of out sample text file (test.txt).
Electric communication will never
be a substitute for the face of
someone who with their soul encourages
another person to be brave and true
And here is the output of the program.
data:image/s3,"s3://crabby-images/ed290/ed2901f54ff677e211d4bd74e40fd63a8e78d49b" alt="character, word and line count in a file"
Here character count includes all characters including white spaces. Word is consecutive non-white-space characters. And line ends with ‘\0’ or ‘\n’ character.