This paper presents a human action recognition method. It analyzes the spatio-temporal grids along the dense trajectories and generates the histogram of oriented gradients (HOG) and histogram of optical flow (HOF) to describe the appearance and motion of the human object. Then, HOG combined with HOF is converted to bag-of-words (BoWs) by the vocabulary tree. Finally, it applies random forest to recognize the type of human action. In the experiments, KTH database and URADL database are tested for the performance evaluation. Comparing with the other approaches, we show that our approach has a better performance for the action videos with high inter-class and low inter-class variabilities.