When the database has a lot of freepages, the cost to sync all
freepages down to disk is high. If the total database size is
small (<10GB), and the application can tolerate ~10 seconds
recovery time, then it is reasonable to simply not sync freelist
and rescan the db to rebuild freelist on recovery.
Read txns would lock pages allocated after the txn, keeping those pages
off the free list until closing the read txn. Instead, track allocating
txid to compute page lifetime, freeing pages if all txns between
page allocation and page free are closed.
Using a large (50gb) database with a read-write-delete heavy load,
nearly 100% of allocated space came from freelists.
1/3 came from freelist.release, 1/3 from freelist.write,
and 1/3 came from tx.allocate to make space for freelist.write.
In the case of freelist.write, the newly allocated giant slice gets
copied to the space prepared by tx.allocate and then discarded.
To avoid this, add func mergepgids that accepts a destination slice,
and use it in freelist.write.
This has a mild negative impact on the existing benchmarks,
but cuts allocated space in my real world db by over 30%.
name old time/op new time/op delta
_FreelistRelease10K-8 18.7µs ±10% 18.2µs ± 4% ~ (p=0.548 n=5+5)
_FreelistRelease100K-8 233µs ± 5% 258µs ±20% ~ (p=0.151 n=5+5)
_FreelistRelease1000K-8 3.34ms ± 8% 3.13ms ± 8% ~ (p=0.151 n=5+5)
_FreelistRelease10000K-8 32.3ms ± 1% 32.2ms ± 7% ~ (p=0.690 n=5+5)
DBBatchAutomatic-8 2.18ms ± 3% 2.19ms ± 4% ~ (p=0.421 n=5+5)
DBBatchSingle-8 140ms ± 6% 140ms ± 4% ~ (p=0.841 n=5+5)
DBBatchManual10x100-8 4.41ms ± 2% 4.37ms ± 3% ~ (p=0.548 n=5+5)
name old alloc/op new alloc/op delta
_FreelistRelease10K-8 82.5kB ± 0% 82.5kB ± 0% ~ (all samples are equal)
_FreelistRelease100K-8 805kB ± 0% 805kB ± 0% ~ (all samples are equal)
_FreelistRelease1000K-8 8.05MB ± 0% 8.05MB ± 0% ~ (all samples are equal)
_FreelistRelease10000K-8 80.4MB ± 0% 80.4MB ± 0% ~ (p=1.000 n=5+5)
DBBatchAutomatic-8 384kB ± 0% 384kB ± 0% ~ (p=0.095 n=5+5)
DBBatchSingle-8 17.2MB ± 1% 17.2MB ± 1% ~ (p=0.310 n=5+5)
DBBatchManual10x100-8 908kB ± 0% 902kB ± 1% ~ (p=0.730 n=4+5)
name old allocs/op new allocs/op delta
_FreelistRelease10K-8 5.00 ± 0% 5.00 ± 0% ~ (all samples are equal)
_FreelistRelease100K-8 5.00 ± 0% 5.00 ± 0% ~ (all samples are equal)
_FreelistRelease1000K-8 5.00 ± 0% 5.00 ± 0% ~ (all samples are equal)
_FreelistRelease10000K-8 5.00 ± 0% 5.00 ± 0% ~ (all samples are equal)
DBBatchAutomatic-8 10.2k ± 0% 10.2k ± 0% +0.07% (p=0.032 n=5+5)
DBBatchSingle-8 58.6k ± 0% 59.6k ± 0% +1.70% (p=0.008 n=5+5)
DBBatchManual10x100-8 6.02k ± 0% 6.03k ± 0% +0.17% (p=0.029 n=4+4)
This commits fixes a timing bug where `DB.StrictMode` can panic
before the goroutine reading the database can finish. If an error
is found in strict mode then it now finishes reading the entire
database before panicking.
This commit changes `Tx.WriteTo()` to use the transaction's
in-memory meta page instead of copying from the disk. This is
needed because the transaction uses the size from its meta page
but writes the current meta page on disk which may have allocated
additional pages since the transaction started.
Fixes#513
This commit refactors the test suite to make it cleaner and to use the
standard testing library better. The `assert()`, `equals()`, and `ok()`
functions have been removed and some test names have been changed for
clarity.
No functionality has been changed.
Only grow the database size when the high watermark increases.
We also grows the database size a little bit aggressively to
save a few ftruncates.
I have tested this on various environments. The performance impact
is ignorable with 16MB over allocation. Without over allocation,
the performance might decrease 100% when each Tx.Commit needs a new
page on a very slow disk (seek time dominates the total write).
This commit removes references to the last write transaction and
and its dirty pages after the transaction has been closed. This
bug did not affect the safety of the data, however, it would
cause dirty pages to linger in-memory until the next write
transaction began.
Added a few lines of documentation to clarify that read-only
transactions need to be rolled back and not committed, as per the
discussion in issue #344
when accessing the node data we used to use cast to
*[maxAllocSize]byte, which breaks if we try to go across maxAllocSize boundary.
This leads to occasional panics.
Sample stacktrace:
```
panic: runtime error: slice bounds out of range
goroutine 1 [running]:
github.com/boltdb/bolt.(*node).write(0xc208010f50, 0xc27452a000)
$GOPATH/src/github.com/boltdb/bolt/node.go:228 +0x5a5
github.com/boltdb/bolt.(*node).spill(0xc208010f50, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/node.go:364 +0x506
github.com/boltdb/bolt.(*node).spill(0xc208010700, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/node.go:336 +0x12d
github.com/boltdb/bolt.(*node).spill(0xc208010620, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/node.go:336 +0x12d
github.com/boltdb/bolt.(*Bucket).spill(0xc22b6ae880, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/bucket.go:535 +0x1c4
github.com/boltdb/bolt.(*Bucket).spill(0xc22b6ae840, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/bucket.go:502 +0xac2
github.com/boltdb/bolt.(*Bucket).spill(0xc22f4e2018, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/bucket.go:502 +0xac2
github.com/boltdb/bolt.(*Tx).Commit(0xc22f4e2000, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/tx.go:150 +0x1ee
github.com/boltdb/bolt.(*DB).Update(0xc2080e4000, 0xc24d077508, 0x0, 0x0)
$GOPATH/src/github.com/boltdb/bolt/db.go:483 +0x169
```
It usually happens when working with large (50M/100M) values.
One way to reproduce it is to change maxAllocSize in bolt_amd64.go to 70000 and run the tests.
TestBucket_Put_Large crashes.
This commit moves the functionality in Tx.Copy() to Tx.WriteTo(). This
allows Tx to be used as an io.WriterTo which makes it easier to mock.
The Tx.Copy() function still exists but it's simply a wrapper around
Tx.WriteTo().
OpenBSD does not include a UBC kernel and writes must be synchronized
with the msync(2) syscall. In addition, the NoSync field of the DB
struct should be ignored on OpenBSD, since unlike other platforms,
missing msyncs will result in data corruption.
Depends on PR #258.
Fixes#257.
This commit adds the DB.NoSync flag to skip fsync() calls on each commit. This should only
be used for bulk loading as it can corrupt your database in the event of a system failure.
Initial tests show it can provide a 2x speed up for sequential inserts.
This commit changes TxStats.RebalanceTime to only update if there are nodes that have been
rebalanced. Previously, the RebalanceTime would include a small amount of time to check if
there were nodes to rebalance but this is confusing to users and not terribly helpful.
This commit adds a defer handler to ensure that transactions are always closed out - even
in the event of a panic within user code. It's recommended that applications always fail
in the event of a panic but some packages such as net/http will automatically recover
which is a problem (IHMO).
This commit adds a cache to the freelist which combines the available free pages and pending free pages in
a single map. This was added to improve performance where freelist.isFree() was consuming 70% of CPU time
for large freelists.
This commit adds an explicit DefaultOptions variable for additional documentation.
Open() can still be passed a nil options which will cause options to be change to
the DefaultOptions variable. This change also allows options to be set globally for
an application if more than one database is being opened in a process.
This commit also moves all errors to errors.go so that the godoc groups them together.
This commit adds the ability for Bolt to fallback to using a regular file open if Tx.Copy()
errors while opening with O_DIRECT. This only affects Linux.
This commit removes several memory allocations occurring on every page and also caches the freelist map used when iterating over the pages. This results in significantly better performance.
The typical use these days is with a managed transaction, via db.View.
The first case (error when re-opening database file) is not tested;
it is harder to instrument, and I have other plans for it.
This commit changes Tx.Check() to return a channel through which check errors are returned. This allows
errors to be found before checking the entire data file.
This commit removes the DB.Check() function and instead makes the user decide
whether a transaction should be writable or read-only. Tx.Check() is not safe
to use concurrently on a read-only transaction, however, it significantly
improves the performance of it.