Credibly aggregating a heterogeneous collection of studies from a retrospective literature review isn’t as easy as the textbooks or journals make it out to be. You might want to read the following threads, and follow up on a few of theses references.
I may have become too skeptical, but I’ve come to the conclusion that if you can’t do a meta-regression to make an attempt to explain heterogeneity (too few studies), and cannot control for heterogeneity before collecting data (ie a prospective meta-analysis), just combine the p-values of the individual studies to indicate there is indirect evidence of a possible effect.
Effect size MA with very small numbers of studies are misleading in many (maybe even the majority) cases, although I can think of a few examples were effect size aggregation with as few as 5 studies was valuable.
Care needs to be taken with effect size combination methods, as the following threads will show.
I recommend starting with the Senn articles, and then look up the articles by @Sander on the issue of standardized effects.