Logtail review

12/4/2023

While WORM might not have been a huge hit CD-R and DVD-R were a big deal for a long time. The target archival media was optical WORM. Vacuum in modern PostgreSQL does a full index scan for every secondary index of the vacuumed relation when there are rows to remove. Vacuum did a full scan of the relation in POSTGRES while modern PostgreSQL only checks pages that require vacuum courtesy of the visibility map It also sets bits in the visibility map and does work to avoid transaction ID wraparound. It does reclaim space for versions that have been deleted and are no longer visible. Vacuum in modern PostgreSQL doesn't move older versions to an archive. If heavy-archive is set for the relation then vacuum will set Tmin/Tmax if unset.ĭifferences between POSTGRES and modern PostgreSQL include: Old versions for relations marked as light-archive and heavy-archive are moved to archive storage. This reclaims space from aborted transactions and moves old versions to the archive. The example was vacuum rel-name after "30 days". POSTGRES had a command to trigger vacuum of a relation. With HOT, a new tuple placed on the same page and with all indexed columns the same as its parent row version does not get new index entries. Without HOT, every version of a row in an update chain has its own index entries, even if all indexed columns are the same. If no indexed columns have changed then the Heap Only Tuples (HOT) optimization is used and the new version is added to the end of the update chain and secondary index entries reference the line pointer for the head of the update chain. So if there 3 secondary indexes and an update changes a column used by 1 of them then maintenance is done for all of them - unless HOT is used.

Also modern PostgreSQL does secondary index maintenance for all secondary indexes unless no indexed columns have changed. Modern PostgreSQL uses the name line pointer. On update a secondary index only needs maintenance if the indexed columns have been changed. Secondary index entry points to line table entry. On each page there is a line table with an entry per anchor point record. It is possible that the thing (query, vacuum) that searches TIME will be delayed by disk reads. Vacuum sets Tmin/Tmax for heavy-archive when moving older versions to the archive. For heavy-archive the reader (a query) will lookup the commit time from the TIME relation and update Tmin/Tmax (thus making a page dirty). For light-archive, old versions are moved to the archive but Tmin/Tmax are not set to avoid the overhead of doing a search of the transaction log to determine their status. Tmin and Tmax are never set for no-archive relations and I assume old versions for them are not moved to the archive. Relations are marked by the user as no-archive, light-archive or heavy-archive. The tail of TIME should be stored in stable main memory to avoid forcing a disk page on commit. This has 32 bits per XID and is updated on commit. POSTGRES contains a TIME relation that has the commit time for each transaction. I am wary of the ability to keep the log in memory for high TPS systems but this is a problem they didn't need to solve at the time.Ģ.4 Concurrency Control and Timestamp Management The ability to keep that in memory is explained in section 2.1. The need to check the transaction log also means that the searched parts of the log must remain in memory or there will be disk reads. That check wouldn't be needed if the commit timestamp were written into the row on commit - but doing that is non-trivial and can hurt performance. The logic includes a check of the transaction log to determine whether the transaction from Xmin or Xmax committed. The check is more complicated than what InnoDB and RocksDB require, but I assume the CPU overhead is not that different than what occurs in modern PostgreSQL and in my testing of modern PG this isn't an issue. This section shows the logic required to determine whether a version is visible to a query. I wonder if that is a feature that was removed in early PostgreSQL and then was returned for a different reason. The notion of an anchor point and update (delta) chain is similar to the current support for Heap Only Tuples (HOT) in modern PostgreSQL. The oldest version of a row was called the anchor point. To use less space updates only stored fields that changed and the other fields were found by following the PTR chain (a singly-linked list). The new version reused the OID of the ended version and the PTR for the new version pointed to the ended version. On update Xmax and Cmax were set to end the row version and a new version of the row was inserted (hopefully to the same page). Tmin was not set because commit had yet to occur. On insert the OID, Xmin and Cmin were set.

0 Comments

Logtail review

Leave a Reply.

Author

Archives

Categories