15 Commits

Author SHA1 Message Date
Evan Jones
d48d36dc02 pgtype/hstore: Make text parsing about 6X faster
I am working on an application that uses hstore types, and we found
that returning the values is slow, particularly when using the text
protocol, such as when using database/sql. This improves parsing to
be about 6X faster (currently faster than binary). The changes are:

* referencing the original string instead of copying into new strings
  (very large win)
* using string.IndexByte to scan double quoted strings: it has
  architecture-specific assembly implementations, and most of the
  time is spent in key/value strings.
* estimating the number of key/value pairs to allocate the correct
  size of the slice and map up front. This reduces the number of
  allocations and bytes allocated by a factor of 2, and was a small
  CPU win.
* parsing directly into the Hstore, rather than copying into it.

This parser is stricter than the old one. It only accepts hstore
strings serialized by Postgres. The old one was already stricter
than Postgres's own parser, but previously accepted any whitespace
character after a comma. This one only accepts space. Example:

  "k1"=>"v1",\t"k2"=>"v2"

Postgres only ever uses ", " as the separator. See hstore_out:
https://github.com/postgres/postgres/blob/master/contrib/hstore/hstore_io.c

The result of using benchstat to compare the benchmark on my M1 Pro
with the following command line in below. The new text parser is now
faster than the binary parser. I will improve the binary parser in a
separate change.

for i in $(seq 10); do go test ./pgtype -run=none -bench=BenchmarkHstoreScan -benchtime=1s >> new.txt; done

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
                               │  orig.txt   │               new.txt               │
                               │   sec/op    │   sec/op     vs base                │
HstoreScan/databasesql.Scan-10   82.11µ ± 1%   10.51µ ± 0%  -87.20% (p=0.000 n=10)
HstoreScan/text-10               83.30µ ± 1%   11.49µ ± 1%  -86.20% (p=0.000 n=10)
HstoreScan/binary-10             15.99µ ± 2%   15.77µ ± 1%   -1.35% (p=0.007 n=10)
geomean                          47.82µ        12.40µ       -74.08%

                               │   orig.txt   │               new.txt                │
                               │     B/op     │     B/op      vs base                │
HstoreScan/databasesql.Scan-10   56.23Ki ± 0%   11.68Ki ± 0%  -79.23% (p=0.000 n=10)
HstoreScan/text-10               65.12Ki ± 0%   20.58Ki ± 0%  -68.40% (p=0.000 n=10)
HstoreScan/binary-10             21.09Ki ± 0%   21.09Ki ± 0%        ~ (p=0.378 n=10)
geomean                          42.58Ki        17.18Ki       -59.66%

                               │  orig.txt   │               new.txt                │
                               │  allocs/op  │ allocs/op   vs base                  │
HstoreScan/databasesql.Scan-10   744.00 ± 0%   44.00 ± 0%  -94.09% (p=0.000 n=10)
HstoreScan/text-10               743.00 ± 0%   44.00 ± 0%  -94.08% (p=0.000 n=10)
HstoreScan/binary-10              464.0 ± 0%   464.0 ± 0%        ~ (p=1.000 n=10) ¹
geomean                           635.4        96.49       -84.81%
¹ all samples are equal
2023-06-16 15:30:54 -05:00
Evan Jones
1b68b5970e pgtype/hstore: Save 2 allocs in database/sql Scan implementation
Remove unneeded string to []byte to string conversion, which saves 2
allocs and should make Hstore text scanning slightly faster.

The Hstore.Scan() function takes a string as input, converts it to
[]byte, and calls scanPlanTextAnyToHstoreScanner.Scan(). That
function converts []byte back to string and calls parseHstore. This
refactors scanPlanTextAnyToHstoreScanner.Scan into
scanPlanTextAnyToHstoreScanner.scanString so the database/sql Scan
function can call it directly, bypassing this conversion.

The added Benchmark shows this saves 2 allocs for longer strings, and
saves about 5% CPU overall on my M1 Pro. benchstat output:

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
              │  orig.txt   │              new.txt               │
              │   sec/op    │   sec/op     vs base               │
HstoreScan-10   1.334µ ± 2%   1.257µ ± 2%  -5.77% (p=0.000 n=10)

              │   orig.txt   │               new.txt               │
              │     B/op     │     B/op      vs base               │
HstoreScan-10   2.094Ki ± 0%   1.969Ki ± 0%  -5.97% (p=0.000 n=10)

              │  orig.txt  │              new.txt              │
              │ allocs/op  │ allocs/op   vs base               │
HstoreScan-10   36.00 ± 0%   34.00 ± 0%  -5.56% (p=0.000 n=10)
2023-06-07 15:35:22 -05:00
Evan Jones
ee04d4a74d pgtype/hstore: Avoid Postgres Mac OS X parsing bug
Postgres on Mac OS X has a bug in how it parses hstore text values
that causes it to misinterpret some Unicode values as spaces. This
causes values sent by pgx to be misinterpreted. To avoid this, always
quote hstore values, which is how Postgres serializes them itself.
The test change fails on Mac OS X without this fix.

While I suspect this should not be performance critical for any
application, I added a quick benchmark to test the performance of the
encoding. This change actually makes encoding slightly faster on my
M1 Pro. The output from the benchstat program on this banchmark is:

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/pgtype
                          │   orig.txt   │           new-quotes.txt            │
                          │    sec/op    │   sec/op     vs base                │
HstoreSerialize/text-10      207.1n ± 0%   142.3n ± 1%  -31.31% (p=0.000 n=10)
HstoreSerialize/binary-10   100.10n ± 0%   99.64n ± 1%   -0.45% (p=0.013 n=10)
geomean                      144.0n        119.1n       -17.31%

I have also attempted to fix the Postgres bug, but it will take a
long time for this fix to get upstream:

https://www.postgresql.org/message-id/CA%2BHWA9awUW0%2BRV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig%40mail.gmail.com
2023-06-07 15:29:25 -05:00
Evan Jones
eab316e200 pgtype.Hstore: Fix quoting of whitespace; Add test
Before this change, the Hstore text protocol did not quote keys or
values containing non-space whitespace ("\r\n\v\t"). This causes
inserts with these values to fail with errors like:

    ERROR: Syntax error near "r" at position 17 (SQLSTATE XX000)

The previous version also quoted curly braces ("{}"), but they don't
seem to require quoting.

It is possible that it would be easier to just always quote the
values, which is what Postgres does when encoding its text protocol,
but this is a smaller change.
2023-05-16 07:02:55 -05:00
Evan Jones
8ceef73b84 pgtype.parseHstore: Reject invalid input; Fix error messages
The parseHstore function did not check the return value from
p.Consume() after a ', ' sequence. It expects a doublequote '"' that
starts the next key, but would accept any character. This means it
accepted invalid input such as:

    "key1"=>"b", ,key2"=>"value"

Add a unit test that covers this case
Fix a couple of the nearby error strings while looking at this.

Found by looking at staticcheck warnings:

    pgtype/hstore.go:434:6: this value of end is never used (SA4006)
    pgtype/hstore.go:434:6: this value of r is never used (SA4006)
2023-05-15 18:10:20 -05:00
Evan Jones
d8b38b28be pgtype/hstore.go: Remove unused quoteHstore{Element,Replacer}
These are unused. The code uses quoteArrayElement instead.
2023-05-13 10:03:22 -05:00
Evan Jones
2a86501e86 Fix hstore NULL versus empty
When running queries with the hstore type registered, and with simple
mode queries, the scan implementation does not correctly distinguish
between NULL and empty. Fix the implementation and add a test to
verify this.
2023-05-13 09:34:30 -05:00
Jack Christensen
f14fb3d692 Replace interface{} with any 2022-04-09 09:12:55 -05:00
Jack Christensen
d13f651810 Finish importing pgio as internal package 2022-02-21 14:35:20 -06:00
Jack Christensen
9c538cd4a9 Remove actualTarget argument 2022-02-21 09:30:01 -06:00
Jack Christensen
1f2f239d09 Renamed pgtype.ConnInfo to pgtype.Map 2022-02-21 09:13:09 -06:00
Jack Christensen
5ed95dcd1c Expose wrap functions on ConnInfo
- Remove rarely used ScanPlan.Scan arguments
- Plus other refactorings and fixes that fell out of this change.
- Plus rows Scan now handles checking for changed type.
2022-01-22 17:50:19 -06:00
Jack Christensen
a6863a7dd2 Convert Hstore to Codec 2022-01-15 17:47:37 -06:00
Jack Christensen
fcc9dcc960 Convert text to Codec
This also entailed updating and deleting types that depended on Text.
2022-01-08 13:13:26 -06:00
Jack Christensen
44214b7854 Import to pgx main repo in pgtype subdir 2021-12-04 13:07:54 -06:00