This commit adds a background reader that can optionally buffer reads.
It is used whenever a potentially blocking write is made to the server.
The background reader is started on a slight delay so there should be no
meaningful performance impact as it doesn't run for quick queries and
its overhead is minimal relative to slower queries.
The non-blocking IO system was designed to solve three problems:
1. Deadlock that can occur when both sides of a connection are blocked
writing because all buffers between are full.
2. The inability to use a write deadline with a TLS.Conn without killing
the connection.
3. Efficiently check if a connection has been closed before writing.
This reduces the cases where the application doesn't know if a query
that does a INSERT/UPDATE/DELETE was actually sent to the server or
not.
However, the nbconn package is extraordinarily complex, has been a
source of very tricky bugs, and has OS specific code paths. It also does
not work at all with underlying net.Conn implementations that do not
have platform specific non-blocking IO syscall support and do not
properly implement deadlines. In particular, this is the case with
golang.org/x/crypto/ssh.
I believe the deadlock problem can be solved with a combination of a
goroutine for CopyFrom like v4 used and a watchdog for regular queries
that uses time.AfterFunc.
The write deadline problem actually should be ignorable. We check for
context cancellation before sending a query and the actual Write should
be almost instant as long as the underlying connection is not blocked.
(We should only have to wait until it is accepted by the OS, not until
it is fully sent.)
Efficiently checking if a connection has been closed is probably the
hardest to solve without non-blocking reads. However, the existing code
only solves part of the problem. It can detect a closed or broken
connection the OS knows about, but it won't actually detect other types
of broken connections such as a network interruption. This is currently
implemented in CheckConn and called automatically when checking a
connection out of the pool that has been idle for over one second. I
think that changing CheckConn to a very short deadline read and changing
the pool to do an actual Ping would be an acceptable solution.
Remove nbconn and non-blocking code. This does not leave the system in
an entirely working state. In particular, CopyFrom is broken, deadlocks
can occur for extremely large queries or batches, and PgConn.CheckConn
is now a `select 1` ping. These will be resolved in subsequent commits.
I am working on an application that reads a lot of hstore values, and
have discovered that scanning it is fairly slow. I'm working on some
improvements, but first I wanted a better benchmark. This adds more
realistic data, and extends it to cover the three APIs: database/sql,
and pgconn.Rows.Scan with both text and binary protocols.
Remove unneeded string to []byte to string conversion, which saves 2
allocs and should make Hstore text scanning slightly faster.
The Hstore.Scan() function takes a string as input, converts it to
[]byte, and calls scanPlanTextAnyToHstoreScanner.Scan(). That
function converts []byte back to string and calls parseHstore. This
refactors scanPlanTextAnyToHstoreScanner.Scan into
scanPlanTextAnyToHstoreScanner.scanString so the database/sql Scan
function can call it directly, bypassing this conversion.
The added Benchmark shows this saves 2 allocs for longer strings, and
saves about 5% CPU overall on my M1 Pro. benchstat output:
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new.txt │
│ sec/op │ sec/op vs base │
HstoreScan-10 1.334µ ± 2% 1.257µ ± 2% -5.77% (p=0.000 n=10)
│ orig.txt │ new.txt │
│ B/op │ B/op vs base │
HstoreScan-10 2.094Ki ± 0% 1.969Ki ± 0% -5.97% (p=0.000 n=10)
│ orig.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
HstoreScan-10 36.00 ± 0% 34.00 ± 0% -5.56% (p=0.000 n=10)
Postgres on Mac OS X has a bug in how it parses hstore text values
that causes it to misinterpret some Unicode values as spaces. This
causes values sent by pgx to be misinterpreted. To avoid this, always
quote hstore values, which is how Postgres serializes them itself.
The test change fails on Mac OS X without this fix.
While I suspect this should not be performance critical for any
application, I added a quick benchmark to test the performance of the
encoding. This change actually makes encoding slightly faster on my
M1 Pro. The output from the benchstat program on this banchmark is:
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new-quotes.txt │
│ sec/op │ sec/op vs base │
HstoreSerialize/text-10 207.1n ± 0% 142.3n ± 1% -31.31% (p=0.000 n=10)
HstoreSerialize/binary-10 100.10n ± 0% 99.64n ± 1% -0.45% (p=0.013 n=10)
geomean 144.0n 119.1n -17.31%
I have also attempted to fix the Postgres bug, but it will take a
long time for this fix to get upstream:
https://www.postgresql.org/message-id/CA%2BHWA9awUW0%2BRV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig%40mail.gmail.com
Tests should timeout in a reasonable time if something is stuck. In
particular this is important when testing deadlock conditions such as
can occur with the copy protocol if both the client and the server are
blocked writing until the other side does a read.
During the update also the following packages were updated:
golang.org/x/sys v0.5.0 to v0.8.0
golang.org/x/text v0.7.0 to v0.9.0
Signed-off-by: Jonathan Gonzalez V <jonathan.abdiel@gmail.com>
This ensures that a closed connection at the pgconn layer is not
considered okay when the background closing of the net.Conn is still in
progress.
This also means that CheckConn cannot be called when the connection is
locked (for example, by in an progress query). But that seems
reasonable. It's not exactly clear that that would have ever worked
anyway.
https://github.com/jackc/pgx/issues/1618#issuecomment-1563702231
If the default_query_exec_mode is unknown, the returned error
previously was:
invalid default_query_exec_mode: <nil>
This changes it to return the argument. Add a test that unknown modes
fail to parse and include this string.
The tests for cancelling requests were failing when using unix
sockets. The reason is that net.Conn.RemoteAddr() calls getpeername()
to get the address. For Unix sockets, this returns the address that
was passed to bind() by the *server* process, not the address that
was passed to connect() by the *client*. For postgres, this is always
relative to the server's directory, so is a path like:
./.s.PGSQL.5432
Since it does not return the full absolute path, this function cannot
connect, so it cannot cancel requests. To fix it, use the connection's
config for Unix sockets. I think this should be okay, since a system
using unix sockets should not have "fallbacks". If that is incorrect,
we will need to save the address on PgConn.
Fixes the following failed tests when using Unix sockets:
--- FAIL: TestConnCancelRequest (2.00s)
pgconn_test.go:2056:
Error Trace: /Users/evan.jones/pgx/pgconn/pgconn_test.go:2056
/Users/evan.jones/pgx/pgconn/asm_arm64.s:1172
Error: Received unexpected error:
dial unix ./.s.PGSQL.5432: connect: no such file or directory
Test: TestConnCancelRequest
pgconn_test.go:2063:
Error Trace: /Users/evan.jones/pgx/pgconn/pgconn_test.go:2063
Error: Object expected to be of type *pgconn.PgError, but was <nil>
Test: TestConnCancelRequest
--- FAIL: TestConnContextCanceledCancelsRunningQueryOnServer (5.10s)
pgconn_test.go:2109:
Error Trace: /Users/evan.jones/pgx/pgconn/pgconn_test.go:2109
Error: Received unexpected error:
timeout: context already done: context deadline exceeded
Test: TestConnContextCanceledCancelsRunningQueryOnServer
Previously, items were never removed from the preparedStatements map.
This means workloads that send a large number of unique queries could
run out of memory. Delete items from the map when sending the
deallocate command to Postgres. Add a test to verify this works.
Fixes https://github.com/jackc/pgx/issues/1456
Before this change, the Hstore text protocol did not quote keys or
values containing non-space whitespace ("\r\n\v\t"). This causes
inserts with these values to fail with errors like:
ERROR: Syntax error near "r" at position 17 (SQLSTATE XX000)
The previous version also quoted curly braces ("{}"), but they don't
seem to require quoting.
It is possible that it would be easier to just always quote the
values, which is what Postgres does when encoding its text protocol,
but this is a smaller change.
The parseHstore function did not check the return value from
p.Consume() after a ', ' sequence. It expects a doublequote '"' that
starts the next key, but would accept any character. This means it
accepted invalid input such as:
"key1"=>"b", ,key2"=>"value"
Add a unit test that covers this case
Fix a couple of the nearby error strings while looking at this.
Found by looking at staticcheck warnings:
pgtype/hstore.go:434:6: this value of end is never used (SA4006)
pgtype/hstore.go:434:6: this value of r is never used (SA4006)
The existing test registers pgtype.Hstore in the text map, then uses
the query modes that use the binary protocol. The existing test did
not use the text parsing code. Add a version of the test that uses
pgtype.Hstore as the input and output argument in all query modes,
and tests it without registering the codec.
When running queries with the hstore type registered, and with simple
mode queries, the scan implementation does not correctly distinguish
between NULL and empty. Fix the implementation and add a test to
verify this.