Full text loading...
Built an end-to-end pipeline that turns 1D stratigraphic realizations into playable 2D subsurface worlds, trains a Deep Q-Network (DQN) to plan drilling, and measures exploration-exploitation trade-offs. Facies sequences were generated with Markov chains conditioned by sea-level history, then expanded and perturbed via deformation, tilting, faulting, and unconformities to inject realistic heterogeneity. OBJ Hydrocarbon migration was simulated agent-by-agent with facies- and permeability-aware move-probability matrices; inverted MPM logic provided plausible source-accumulation connectivity for labeling fluid presence. Fluid properties were mapped into elastic responses using Gassmann fluid substitution, and reflectivity was convolved to produce synthetic seismic amplitudes that serve as DQN observations. The game-like RL environment enforced budgets, per-action costs, well limits, and sparse rewards; the DQN used CNN features, experience replay, and a target network. Learned that annealing ɛ while using a high discount factor (y≈0.99) consistently outperformed constant-ɛ policies, yielding deeper, more profitable wells—evidence that deliberate early exploration plus strong long-term valuation beats premature exploitation. Also learned that geologic complexity synthesized with the Markov-chain + ABM stack improves policy robustness, because facies transitions and permeability contrasts expose the agent to the failure modes it must learn to avoid.