Underlying types were already tried. But []byte is not a normal
underlying type. It is a slice. But since is can be treated as a scalar
instead of an array / slice we need to special case it.
https://github.com/jackc/pgx/issues/1763
The context timeouts for tests are designed to give a better error
message when something hangs rather than the test just timing out.
Unfortunately, the potato CI frequently has some test or another
randomly take a long time. While the increased times are somewhat less
than optimal on a real computer, hopefully this will solve the
flickering CI.
Most of this is in conversion, and I assume it became unused with some
of the v5 changes and refactors to a codec-based approach.
There are likely a few more cleanups to be made, but these ones seemed
easy and safe to start with.
As recommended by go-staticcheck, but also might be a bit more efficient
for the compiler to implement, since we don't care about which slice of
bytes is greater than the other one.
This ensures the output of Encode can pass through Scan and produce
the same input. This found two two minor problems with the text
codec. These are not bugs: These situations do not happen when using
pgx with Postgres. However, I think it is worth fixing to ensure the
code is internally consistent.
The problems with the text codec are:
* It did not correctly distinguish between nil and empty. This is not
a problem with Postgres, since NULL values are marked separately,
but the binary codec distinguishes between them, so it seems like
the text codec should as well.
* It did not output spaces between keys. Postgres produces output in
this format, and the parser now only strictly parses the Postgres
format. This is not a bug, but seems like a good idea.
Arrays with values that start or end with vtab ("\v") must be quoted.
Postgres's array parser skips leading and trailing whitespace with
the array_isspace() function, which is slightly different from the
scanner_isspace() function that was previously linked. Add a test
that reproduces this failure, and fix the definition of isSpace.
This also includes a change to use strings.EqualFold which should
really not matter, but does not require copying the string.
* use []string for value string pointers: one allocation instead of
one per value.
* use one string for all key/value pairs, instead of one for each.
After this change, one Hstore will share two allocations: one string
and one []string. The disadvantage is that it cannot be deallocated
until all key/value pairs are unused. This means if an application
takes a single key or value from the Hstore and holds on to it, its
memory footprint will increase. I would guess this is an unlikely
problem, but it is possible.
The benchstat results from my M1 Max are below.
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new.txt │
│ sec/op │ sec/op vs base │
HstoreScan/databasesql.Scan-10 82.11µ ± 1% 82.66µ ± 2% ~ (p=0.436 n=10)
HstoreScan/text-10 83.30µ ± 1% 84.24µ ± 3% ~ (p=0.165 n=10)
HstoreScan/binary-10 15.987µ ± 2% 7.459µ ± 6% -53.35% (p=0.000 n=10)
geomean 47.82µ 37.31µ -21.98%
│ orig.txt │ new.txt │
│ B/op │ B/op vs base │
HstoreScan/databasesql.Scan-10 56.23Ki ± 0% 56.23Ki ± 0% ~ (p=0.324 n=10)
HstoreScan/text-10 65.12Ki ± 0% 65.12Ki ± 0% ~ (p=0.675 n=10)
HstoreScan/binary-10 21.09Ki ± 0% 20.73Ki ± 0% -1.70% (p=0.000 n=10)
geomean 42.58Ki 42.34Ki -0.57%
│ orig.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
HstoreScan/databasesql.Scan-10 744.0 ± 0% 744.0 ± 0% ~ (p=1.000 n=10) ¹
HstoreScan/text-10 743.0 ± 0% 743.0 ± 0% ~ (p=1.000 n=10) ¹
HstoreScan/binary-10 464.00 ± 0% 41.00 ± 0% -91.16% (p=0.000 n=10)
geomean 635.4 283.0 -55.46%
¹ all samples are equal
I am working on an application that uses hstore types, and we found
that returning the values is slow, particularly when using the text
protocol, such as when using database/sql. This improves parsing to
be about 6X faster (currently faster than binary). The changes are:
* referencing the original string instead of copying into new strings
(very large win)
* using string.IndexByte to scan double quoted strings: it has
architecture-specific assembly implementations, and most of the
time is spent in key/value strings.
* estimating the number of key/value pairs to allocate the correct
size of the slice and map up front. This reduces the number of
allocations and bytes allocated by a factor of 2, and was a small
CPU win.
* parsing directly into the Hstore, rather than copying into it.
This parser is stricter than the old one. It only accepts hstore
strings serialized by Postgres. The old one was already stricter
than Postgres's own parser, but previously accepted any whitespace
character after a comma. This one only accepts space. Example:
"k1"=>"v1",\t"k2"=>"v2"
Postgres only ever uses ", " as the separator. See hstore_out:
https://github.com/postgres/postgres/blob/master/contrib/hstore/hstore_io.c
The result of using benchstat to compare the benchmark on my M1 Pro
with the following command line in below. The new text parser is now
faster than the binary parser. I will improve the binary parser in a
separate change.
for i in $(seq 10); do go test ./pgtype -run=none -bench=BenchmarkHstoreScan -benchtime=1s >> new.txt; done
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new.txt │
│ sec/op │ sec/op vs base │
HstoreScan/databasesql.Scan-10 82.11µ ± 1% 10.51µ ± 0% -87.20% (p=0.000 n=10)
HstoreScan/text-10 83.30µ ± 1% 11.49µ ± 1% -86.20% (p=0.000 n=10)
HstoreScan/binary-10 15.99µ ± 2% 15.77µ ± 1% -1.35% (p=0.007 n=10)
geomean 47.82µ 12.40µ -74.08%
│ orig.txt │ new.txt │
│ B/op │ B/op vs base │
HstoreScan/databasesql.Scan-10 56.23Ki ± 0% 11.68Ki ± 0% -79.23% (p=0.000 n=10)
HstoreScan/text-10 65.12Ki ± 0% 20.58Ki ± 0% -68.40% (p=0.000 n=10)
HstoreScan/binary-10 21.09Ki ± 0% 21.09Ki ± 0% ~ (p=0.378 n=10)
geomean 42.58Ki 17.18Ki -59.66%
│ orig.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
HstoreScan/databasesql.Scan-10 744.00 ± 0% 44.00 ± 0% -94.09% (p=0.000 n=10)
HstoreScan/text-10 743.00 ± 0% 44.00 ± 0% -94.08% (p=0.000 n=10)
HstoreScan/binary-10 464.0 ± 0% 464.0 ± 0% ~ (p=1.000 n=10) ¹
geomean 635.4 96.49 -84.81%
¹ all samples are equal
I am working on an application that reads a lot of hstore values, and
have discovered that scanning it is fairly slow. I'm working on some
improvements, but first I wanted a better benchmark. This adds more
realistic data, and extends it to cover the three APIs: database/sql,
and pgconn.Rows.Scan with both text and binary protocols.
Remove unneeded string to []byte to string conversion, which saves 2
allocs and should make Hstore text scanning slightly faster.
The Hstore.Scan() function takes a string as input, converts it to
[]byte, and calls scanPlanTextAnyToHstoreScanner.Scan(). That
function converts []byte back to string and calls parseHstore. This
refactors scanPlanTextAnyToHstoreScanner.Scan into
scanPlanTextAnyToHstoreScanner.scanString so the database/sql Scan
function can call it directly, bypassing this conversion.
The added Benchmark shows this saves 2 allocs for longer strings, and
saves about 5% CPU overall on my M1 Pro. benchstat output:
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new.txt │
│ sec/op │ sec/op vs base │
HstoreScan-10 1.334µ ± 2% 1.257µ ± 2% -5.77% (p=0.000 n=10)
│ orig.txt │ new.txt │
│ B/op │ B/op vs base │
HstoreScan-10 2.094Ki ± 0% 1.969Ki ± 0% -5.97% (p=0.000 n=10)
│ orig.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
HstoreScan-10 36.00 ± 0% 34.00 ± 0% -5.56% (p=0.000 n=10)
Postgres on Mac OS X has a bug in how it parses hstore text values
that causes it to misinterpret some Unicode values as spaces. This
causes values sent by pgx to be misinterpreted. To avoid this, always
quote hstore values, which is how Postgres serializes them itself.
The test change fails on Mac OS X without this fix.
While I suspect this should not be performance critical for any
application, I added a quick benchmark to test the performance of the
encoding. This change actually makes encoding slightly faster on my
M1 Pro. The output from the benchstat program on this banchmark is:
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
│ orig.txt │ new-quotes.txt │
│ sec/op │ sec/op vs base │
HstoreSerialize/text-10 207.1n ± 0% 142.3n ± 1% -31.31% (p=0.000 n=10)
HstoreSerialize/binary-10 100.10n ± 0% 99.64n ± 1% -0.45% (p=0.013 n=10)
geomean 144.0n 119.1n -17.31%
I have also attempted to fix the Postgres bug, but it will take a
long time for this fix to get upstream:
https://www.postgresql.org/message-id/CA%2BHWA9awUW0%2BRV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig%40mail.gmail.com
Tests should timeout in a reasonable time if something is stuck. In
particular this is important when testing deadlock conditions such as
can occur with the copy protocol if both the client and the server are
blocked writing until the other side does a read.
Before this change, the Hstore text protocol did not quote keys or
values containing non-space whitespace ("\r\n\v\t"). This causes
inserts with these values to fail with errors like:
ERROR: Syntax error near "r" at position 17 (SQLSTATE XX000)
The previous version also quoted curly braces ("{}"), but they don't
seem to require quoting.
It is possible that it would be easier to just always quote the
values, which is what Postgres does when encoding its text protocol,
but this is a smaller change.
The parseHstore function did not check the return value from
p.Consume() after a ', ' sequence. It expects a doublequote '"' that
starts the next key, but would accept any character. This means it
accepted invalid input such as:
"key1"=>"b", ,key2"=>"value"
Add a unit test that covers this case
Fix a couple of the nearby error strings while looking at this.
Found by looking at staticcheck warnings:
pgtype/hstore.go:434:6: this value of end is never used (SA4006)
pgtype/hstore.go:434:6: this value of r is never used (SA4006)
The existing test registers pgtype.Hstore in the text map, then uses
the query modes that use the binary protocol. The existing test did
not use the text parsing code. Add a version of the test that uses
pgtype.Hstore as the input and output argument in all query modes,
and tests it without registering the codec.
When running queries with the hstore type registered, and with simple
mode queries, the scan implementation does not correctly distinguish
between NULL and empty. Fix the implementation and add a test to
verify this.
Table types have system / hidden columns like tableoid, cmax, xmax, etc.
These are not included when sending or receiving composite types.
https://github.com/jackc/pgx/issues/1576
When using `scany` I encountered the following case. This seems to fix it.
Looks like null `jsonb` columns cause the problem. If you create a table like below you can see that the following code fails. Is this expected?
```sql
CREATE TABLE test (
a int4 NULL,
b int4 NULL,
c jsonb NULL
);
INSERT INTO test (a, b, c) VALUES (1, null, null);
```
```go
package main
import (
"context"
"log"
"github.com/georgysavva/scany/v2/pgxscan"
"github.com/jackc/pgx/v5"
)
func main() {
var rows []map[string]interface{}
conn, _ := pgx.Connect(context.Background(), , ts.PGURL().String())
// this will fail with can't scan into dest[0]: cannot scan NULL into *interface {}
err := pgxscan.Select(context.Background(), conn, &rows, `SELECT c from test`)
// this works
// err = pgxscan.Select(context.Background(), conn, &rows, `SELECT a,b from test`)
if err != nil {
panic(err)
}
log.Printf("%+v", rows)
}
```