Today, model checking is one of the essential techniques in the verification of software systems. This technique can verify some properties such as reachability in which the entire state space is searched to find the desired state. However, model checking may lead to the state space explosion problem in which all states cannot be generated due to the exponential resource usage. Although the results of recent model checking approaches are promising, there is still room for improvement in terms of accuracy and the number of explored states. In this paper, using deep reinforcement learning and two neural networks, we propose an approach to increase the accuracy of the generated witnesses and reduce the use of hardware resources. In this approach, at first, an agent starts to explore the state space without any knowledge and gradually identifies the proper and improper actions by receiving different rewards/penalties from the environment to achieve the goal. Once the dataset is fulfilled with the agent's experiences, two neural networks evaluate the quality of each operation in each state, and afterwards, the best action is selected. The significant difficulties and challenges in the implementation are encoding the states, feature engineering, feature selection, reward engineering, handling invalid actions, and configuring the neural network. Finally, the proposed approach has been implemented in the Groove toolset, and as a result, in most of the case studies, it overcame the problem of state space explosion. Also, this approach outperforms the existing solutions in terms of generating shorter witnesses and exploring fewer states. On average, the proposed approach is nearly 400% better than other approaches in exploring fewer states and 300% better than the others in generating shorter witnesses. Also, on average, the proposed approach is 37% more accurate than the others in terms of finding the goals state.