C Program to Count Characters, Words and Lines in a File

To get a quick summary of a file like total number of characters, words and limes, Linux already has a tool, wc. Here we’ll see how to write C program to get the similar information.

Strategy to Count Characters, Words, Lines in a File

  1. Take input of a file name and open that file in a read only mode. Don’t continue if the file can’t be opened.
  2. Traverse the file character by character until you get the EOF character. Every file ends with the EOF character.
    1. Increment the character count.
    2. If the character is not a white-space character, set a flag in_word to 1.
    3. If the character is a white-space and the in_word flag is 1, increment the word count and set the in_word flag to 0.
      1. If the character is either ‘\n’ or ‘\0’, increment the line count.

The Program

/*test.c*/

#include <stdio.h>
#define MAX_LEN 1024

int main() {
  /*Read the file.*/

  char ch;
  int char_count = 0, word_count = 0, line_count = 0;
  int in_word = 0;
  char file_name[MAX_LEN];
  FILE *fp;

  printf("Enter a file name: ");
  scanf("%s", file_name);

  fp = fopen(file_name, "r");

  if(fp == NULL) {
    printf("Could not open the file %s\n", file_name);
    return 1;
  }

  while ((ch = fgetc(fp)) != EOF) {
    char_count++;

    if(ch == ' ' || ch == '\t' || ch == '\0' || ch == '\n') {
      if (in_word) {
        in_word = 0;
        word_count++;
      }

      if(ch = '\0' || ch == '\n') line_count++;

    } else {
      in_word = 1;
    }
  }

  printf("In the file %s:\n", file_name);
  printf("Number of characters: %d.\n", char_count);
  printf("Number of words: %d.\n", word_count);
  printf("Number of lines: %d.\n", line_count);

  return 0;
}

Here is the content of out sample text file (test.txt).

Electric communication will never
be a substitute for the face of
someone who with their soul encourages
another person to be brave and true

And here is the output of the program.

character, word and line count in a file

Here character count includes all characters including white spaces. Word is consecutive non-white-space characters. And line ends with ‘\0’ or ‘\n’ character.

Author: Srikanta

I write here to help the readers learn and understand computer programing, algorithms, networking, OS concepts etc. in a simple way. I have 20 years of working experience in computer networking and industrial automation.


If you also want to contribute, click here.

Leave a Reply

Your email address will not be published. Required fields are marked *

21
14
4
6
69
19