This report discusses the details of the architecture of a performance monitoring tool on an x86 processor. Although modern CPUs have Performance Monitoring Units (PMUs) that are relatively easy to program, collecting and interpreting the data requires a fair amount of complexity. The discussion involves the issues of obtaining and storing the raw performance data and the aspects of their subsequent analysis.