Reinforcement Learning (RL) is used to find the best policy. A policy is a rule that maps a given state to an appropriate action. The RL is used to learn utility functions for dynamic resource allocation. According to the future demand of resources, the learned policy maps the appropriate resources in a way that wasting energy and resources is stopped and Service Level Agreements (SLA) violation and Quality of Service (QoS) dropping are also avoided. However, The RL encounters a lot of problems in this field such as: having good policies in the early phases of learning and the learning time to converge to the optimal policy. This paper deals with these problems using the appropriate initialization of Q learning and a new fuzzy approach to increase the convergence speed to the optimal policy. The fuzzy approach presented in this paper improves the accuracy and speed of convergence of Q learning. Firstly, the proposed method predicts the future workload, then determines the appropriate number of physical machines by using the optimal policy learned by improved Q learning. The evaluation results show the advantages of accuracy and convergence speed of the proposed method in comparison with the similar methods.