ERROR: invalid page in block 572319 of relation base/123456/234567
ERROR: invalid memory alloc request size 18446744073709551614
"Of the total sample of 1.53 million disks, 3855 disks developed checksum mismatches – 3088 of the 358,000 nearline disks (0.86%) and 767 of the 1.17 million enterprise class disks (0.065%)." - An Analysis of Data Corruption in the Storage Stack
fsync
pg_test_fsync
numbers are too good to be true, the probably are.DROP TABLE
fsync
and full_page_writes
Is the data readable?
find $PGDATA -type f | xargs -l md5sum
REINDEX
.
https://github.com/ants/pg-recovery-tools/blob/master/trycopy.py
https://github.com/ants/pg-recovery-tools/blob/master/xlogfilter.py
x
daily backups, y
weekly backups, z
monthly backupsNot using ECC memory is just asking for trouble.
"For example, we observe DRAM error rates that are orders of magnitude higher than previously reported, with 25,000 to 70,000 errors per billion device hours per Mbit and more than 8% of DIMMs affected by errors per year." - DRAM Errors in the Wild: A Large-Scale Field Study
Overclocking your database server is a bad idea.