<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DataTerreno]]></title><description><![CDATA[Field notes on sovereign data, AI agents, and lakehouse architectures - building tools to query, protect, and grow knowledge without giving up control.]]></description><link>https://blog.dataterreno.com</link><image><url>https://substackcdn.com/image/fetch/$s_!FqS9!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3061271c-6772-4d06-9585-e730bcf64df9_256x256.png</url><title>DataTerreno</title><link>https://blog.dataterreno.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 03 Jun 2026 13:36:06 GMT</lastBuildDate><atom:link href="https://blog.dataterreno.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[DataTerreno]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[blog@dataterreno.com]]></webMaster><itunes:owner><itunes:email><![CDATA[blog@dataterreno.com]]></itunes:email><itunes:name><![CDATA[DataTerreno]]></itunes:name></itunes:owner><itunes:author><![CDATA[DataTerreno]]></itunes:author><googleplay:owner><![CDATA[blog@dataterreno.com]]></googleplay:owner><googleplay:email><![CDATA[blog@dataterreno.com]]></googleplay:email><googleplay:author><![CDATA[DataTerreno]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[When SQL Is Not Enough: Exploring Hybrid SQL + Semantic Search in an Agent]]></title><description><![CDATA[Using semantic retrieval as a candidate generator, without giving up SQL as the source of truth]]></description><link>https://blog.dataterreno.com/p/when-sql-is-not-enough-exploring</link><guid isPermaLink="false">https://blog.dataterreno.com/p/when-sql-is-not-enough-exploring</guid><dc:creator><![CDATA[DataTerreno]]></dc:creator><pubDate>Tue, 19 May 2026 17:34:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HnXH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the <a href="https://dataterreno.substack.com/breaking-ground">previous post</a> I wrote about the motivation behind DataTerreno: building systems that make data easier to use without giving up control over it.</p><p>This post goes one layer deeper.</p><p>The question I am exploring now is very specific:</p><p><strong>Does it make sense to introduce hybrid SQL + semantic search inside a data agent?</strong></p><p>Not as a replacement for SQL. Not as a magic vector database that suddenly understands the business. And definitely not as a shortcut to skip data modelling.</p><p>The idea is more modest:</p><p>Use semantic search to find candidate rows when the user asks fuzzy, conceptual or free-text questions, and then use SQL to validate, filter, join, aggregate and produce the final grounded evidence.</p><p>That distinction matters.</p><p>Semantic search is good at finding things that are <em>about</em> something. SQL is good at computing exact answers over structured data. A useful agent needs both, but it also needs to know where one ends and the other begins.</p><div><hr></div><h2>The limitation I hit with plain SQL agents</h2><p>A classic SQL agent works reasonably well when the question maps cleanly to relational operations.</p><p>For example, in a CRM database:</p><p><em>How many opportunities were closed last month?</em></p><p>That is a natural fit for SQL:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;fff17e72-014b-423d-a888-699d25ad60b7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT COUNT(*)
FROM opportunities
WHERE status = &#8216;Closed Won&#8217;
 AND close_date &gt;= DATE &#8216;2026-04-01&#8217;
 AND close_date &lt; DATE &#8216;2026-05-01&#8217;;</code></pre></div><p>The user asks for a count. The database has a status column and a date column. SQL is exactly the right tool.</p><p>But real questions are often messier:</p><p><em>Show me the opportunities related to cloud migration from last quarter.</em></p><p>Now we have two different kinds of intent mixed together:</p><ol><li><p><strong>Structured filtering</strong>: &#8220;<em>from last quarter</em>&#8221;.</p></li><li><p><strong>Semantic matching</strong>: &#8220;<em>related to cloud migration</em>&#8221;.</p></li></ol><p>The date filter belongs in SQL. The concept &#8220;<em>cloud migration</em>&#8221; may be hidden inside fields like opportunity notes, project description, meeting summaries or customer requirements.</p><p>Trying to solve this with only SQL usually means falling back to brittle LIKE predicates:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;67ea7752-d009-4919-b770-5fbe99825876&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">WHERE LOWER(description) LIKE &#8216;%cloud%&#8217;
 OR LOWER(description) LIKE &#8216;%migration%&#8217;</code></pre></div><p>That works for obvious cases, but it misses synonyms, paraphrases and domain language:</p><ul><li><p>workload modernization</p></li><li><p>move from on-prem to hosted infrastructure</p></li><li><p>data center exit</p></li><li><p>application replatforming</p></li><li><p>hybrid cloud adoption</p></li></ul><p>This is where semantic search starts to make sense.</p><div><hr></div><h2>The high-level idea</h2><p>At a high level, the agent should not choose between SQL and semantic search. It should orchestrate both.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HnXH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HnXH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 424w, https://substackcdn.com/image/fetch/$s_!HnXH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 848w, https://substackcdn.com/image/fetch/$s_!HnXH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 1272w, https://substackcdn.com/image/fetch/$s_!HnXH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HnXH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png" width="416" height="519.8146167557933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1402,&quot;width&quot;:1122,&quot;resizeWidth&quot;:416,&quot;bytes&quot;:1027583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HnXH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 424w, https://substackcdn.com/image/fetch/$s_!HnXH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 848w, https://substackcdn.com/image/fetch/$s_!HnXH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 1272w, https://substackcdn.com/image/fetch/$s_!HnXH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd28bf094-dfa4-49b1-9b22-c2bb86e32bcf_1122x1402.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The semantic index does not answer the question. It only says:</p><p><em>These rows look semantically relevant.</em></p><p>The source database still has to answer:</p><p><em>Which of those rows actually match the structured constraints, permissions, joins and aggregations required by the user?</em></p><p>That is the core design principle I am using.</p><div><hr></div><h2>What we have implemented so far</h2><p>In the current backend, the SQL agent is implemented as a self-contained TAG-style agent.</p><p>The flow is roughly:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MvrQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MvrQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 424w, https://substackcdn.com/image/fetch/$s_!MvrQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 848w, https://substackcdn.com/image/fetch/$s_!MvrQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 1272w, https://substackcdn.com/image/fetch/$s_!MvrQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MvrQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png" width="328" height="705.6077348066299" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3115,&quot;width&quot;:1448,&quot;resizeWidth&quot;:328,&quot;bytes&quot;:285387,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MvrQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 424w, https://substackcdn.com/image/fetch/$s_!MvrQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 848w, https://substackcdn.com/image/fetch/$s_!MvrQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 1272w, https://substackcdn.com/image/fetch/$s_!MvrQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7386ba5f-5877-4a66-85b9-0a367e1e2b77_1448x3115.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A few details are important.</p><p>First, semantic search is exposed as an internal agent tool called <code>sql_semantic_search</code>. Its job is to search a private semantic index for configured SQL table rows. It returns candidate matches, primary keys, semantic rank, similarity and a ready-to-use SQL CTE snippet.</p><p>Second, the final answer is not allowed to come directly from the semantic index. The tool contract is explicit: semantic matches are private candidate rows; SQL execution over the configured database is still required for final row values, counts, joins and aggregations.</p><p>Third, the SQL agent validates every query before execution. The validator only accepts read-only <code>SELECT </code>or <code>WITH </code>statements and rejects dangerous operations such as <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>, <code>DROP</code>, <code>ALTER</code>, <code>COPY</code>, <code>LOAD</code>, <code>PRAGMA</code>, and similar commands.</p><p>This is not just a safety detail. It is what keeps the agent grounded in a controlled data workflow instead of turning it into a free-form code generator connected to a database.</p><div><hr></div><h2>Choosing what to index</h2><p>One of the first architectural decisions is deciding which columns deserve semantic indexing.</p><p>Not every column should be embedded.</p><p>Good candidates are columns that contain free text and where exact matching is likely to fail:</p><ul><li><p>case descriptions</p></li><li><p>opportunity notes</p></li><li><p>support tickets</p></li><li><p>product reviews</p></li><li><p>contract descriptions</p></li><li><p>tender titles</p></li><li><p>meeting summaries</p></li><li><p>customer requirements</p></li></ul><p>Poor candidates are columns that already have precise structure:</p><ul><li><p>dates</p></li><li><p>numeric amounts</p></li><li><p>foreign keys</p></li><li><p>status codes</p></li><li><p>identifiers</p></li><li><p>booleans</p></li><li><p>controlled categorical values</p></li></ul><p>In the backend this is configured per SQL connection using a semantic profile. A table can enable semantic search and declare its primary key and text columns.</p><p>A simplified CRM-style configuration would look like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;b4ccfd93-77ca-48da-b3c6-c2635293916c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  &#8220;connections&#8221;: [
    {
      &#8220;name&#8221;: &#8220;crm&#8221;,
      &#8220;dialect&#8221;: &#8220;postgresql&#8221;,
      &#8220;connection_uri&#8221;: &#8220;postgresql://...&#8221;,
      &#8220;default_schema&#8221;: &#8220;public&#8221;,
      &#8220;semantic_profile&#8221;: {
        &#8220;tables&#8221;: {
          &#8220;opportunities&#8221;: {
            &#8220;primary_key&#8221;: [&#8221;opportunity_id&#8221;],
            &#8220;semantic_search&#8221;: {
              &#8220;enabled&#8221;: true,
              &#8220;text_columns&#8221;: [&#8221;title&#8221;, &#8220;description&#8221;, &#8220;next_steps&#8221;],
              &#8220;batch_size&#8221;: 500
            }
          }
        }
      }
    }
  ]
}</code></pre></div><p>The important part is that this is explicit. The system should not blindly embed every column in every table. That would create noise, increase cost and make the retrieval layer harder to reason about.</p><div><hr></div><h2>Mapping semantic candidates back to real rows</h2><p>Once a row is embedded, we need a stable way to map the vector back to the original data.</p><p>In the current implementation, each indexed row is linked to the source table through:</p><ul><li><p>tenant id</p></li><li><p>user id</p></li><li><p>connection name</p></li><li><p>source schema</p></li><li><p>source table</p></li><li><p>source key</p></li><li><p>source key hash</p></li></ul><p>The source key is derived from the configured primary key. If the table has a simple primary key, the source key can be just that value. If the table uses a composite key, the key is serialized as a JSON array.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!019G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!019G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!019G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!019G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!019G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!019G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png" width="394" height="525.3333333333334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1448,&quot;width&quot;:1086,&quot;resizeWidth&quot;:394,&quot;bytes&quot;:896462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!019G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!019G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!019G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!019G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c54fa8d-4ffe-4cf0-ae48-cb3455588f4f_1086x1448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a small but critical detail.</p><p>The semantic index should not become a second source of truth. It should behave more like a search accelerator. It helps the agent find candidates, but the source database is still responsible for returning the actual rows.</p><p>That also keeps the index compact. The current model stores the vector, source identity, content hash, embedding model and synchronization metadata. It does not keep a full duplicated copy of the original row.</p><p>That is good for control and simplicity, but it also creates one of the next challenges: filter pushdown.</p><p>More on that later.</p><div><hr></div><h2>Why CTEs are useful here</h2><p>A CTE, or Common Table Expression, is a temporary named result set inside a SQL query.</p><p>It is defined using <code>WITH </code>and can then be referenced by the main query.</p><p>Simple example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;b696bfa9-b4fd-4afe-aa10-d22ac30f6d0e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">WITH recent_opportunities AS (
 SELECT *
 FROM opportunities
 WHERE close_date &gt;= DATE &#8216;2026-04-01&#8217;
)
SELECT owner_id, COUNT(*)
FROM recent_opportunities
GROUP BY owner_id;</code></pre></div><p>You can think of it as a readable intermediate step:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YtjQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YtjQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 424w, https://substackcdn.com/image/fetch/$s_!YtjQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 848w, https://substackcdn.com/image/fetch/$s_!YtjQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!YtjQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YtjQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png" width="198" height="310.8281829419036" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1270,&quot;width&quot;:809,&quot;resizeWidth&quot;:198,&quot;bytes&quot;:45308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YtjQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 424w, https://substackcdn.com/image/fetch/$s_!YtjQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 848w, https://substackcdn.com/image/fetch/$s_!YtjQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!YtjQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b009a16-08c1-4e46-9b2b-344c52532d29_809x1270.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In our hybrid SQL + semantic search flow, CTEs are useful because the semantic tool can return a small candidate set as a SQL fragment.</p><p>For example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;8dabe31b-884b-4559-9c25-cd07434398ee&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">WITH semantic_candidates_opportunities(
  opportunity_id,
  semantic_rank,
  semantic_similarity
) AS (
  VALUES
  (&#8217;OPP-001&#8217;, 1, 0.91),
  (&#8217;OPP-017&#8217;, 2, 0.88),
  (&#8217;OPP-042&#8217;, 3, 0.84)
)
SELECT
  o.opportunity_id,
  o.title,
  o.stage,
  o.close_date,
  sem.semantic_rank,
  sem.semantic_similarity
FROM opportunities o
JOIN semantic_candidates_opportunities sem
  ON o.opportunity_id = sem.opportunity_id
WHERE o.close_date &gt;= DATE &#8216;2026-04-01&#8217;
  AND o.close_date &lt; DATE &#8216;2026-05-01&#8217;
ORDER BY sem.semantic_rank;</code></pre></div><p>The CTE acts as a bridge between the semantic retrieval layer and the relational database.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6NuM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6NuM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 424w, https://substackcdn.com/image/fetch/$s_!6NuM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 848w, https://substackcdn.com/image/fetch/$s_!6NuM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 1272w, https://substackcdn.com/image/fetch/$s_!6NuM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6NuM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png" width="194" height="391.6733727810651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1706,&quot;width&quot;:845,&quot;resizeWidth&quot;:194,&quot;bytes&quot;:119289,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6NuM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 424w, https://substackcdn.com/image/fetch/$s_!6NuM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 848w, https://substackcdn.com/image/fetch/$s_!6NuM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 1272w, https://substackcdn.com/image/fetch/$s_!6NuM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bb2fcf-65b9-45a1-ba4b-7498bf60f697_845x1706.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This pattern is simple, debuggable and portable enough to test across different SQL engines.</p><p>It is also very useful for agents because the CTE is explicit. The model does not need to invent a vector function inside SQL. It receives concrete candidate keys and joins them back to the source table.</p><div><hr></div><h2>The filtering problem</h2><p>Now comes the interesting part.</p><p>Suppose the user asks:</p><p><em>Find sales of products related to technology during last month.</em></p><p>There are two possible execution strategies.</p><p>The naive strategy is:</p><ol><li><p>Run semantic search for &#8220;<em>technology</em>&#8221; across all products.</p></li><li><p>Get the top semantic candidates.</p></li><li><p>Use SQL to filter those candidates to last month&#8217;s sales.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xw-P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xw-P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!xw-P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!xw-P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!xw-P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xw-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png" width="254" height="338.6666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1448,&quot;width&quot;:1086,&quot;resizeWidth&quot;:254,&quot;bytes&quot;:920810,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xw-P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!xw-P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!xw-P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!xw-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F472476ae-43c8-4d73-8944-2ff857f70efe_1086x1448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This works, but it can be inefficient and noisy.</p><p>Why?</p><p>Because the semantic search was performed over the full product universe, not over the subset of products that were actually sold last month.</p><p>The top semantic results may include products that are conceptually related to technology but irrelevant to the time window. By the time SQL applies the date filter, many good candidates may already have been lost because the vector search spent its top-k budget on rows outside the structured constraint.</p><p>The better strategy is:</p><ol><li><p>Use SQL-like filters first to restrict the candidate universe.</p></li><li><p>Run semantic search only inside that filtered universe.</p></li><li><p>Return candidates that are both semantically relevant and structurally valid.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FqOH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FqOH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 424w, https://substackcdn.com/image/fetch/$s_!FqOH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 848w, https://substackcdn.com/image/fetch/$s_!FqOH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 1272w, https://substackcdn.com/image/fetch/$s_!FqOH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FqOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png" width="216" height="449.05263157894734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:418,&quot;resizeWidth&quot;:216,&quot;bytes&quot;:45538,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FqOH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 424w, https://substackcdn.com/image/fetch/$s_!FqOH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 848w, https://substackcdn.com/image/fetch/$s_!FqOH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 1272w, https://substackcdn.com/image/fetch/$s_!FqOH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59d4eba5-664b-45f5-ba2e-61033bda4fdb_418x869.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the architectural tension.</p><p>To do filtered semantic search efficiently, the semantic index needs to know more than just embeddings. It also needs enough structured metadata to apply filters before vector search:</p><ul><li><p>dates</p></li><li><p>tenant / user scope</p></li><li><p>source table</p></li><li><p>identifiers</p></li><li><p>foreign keys</p></li><li><p>status fields</p></li><li><p>maybe some categorical dimensions</p></li></ul><p>That means the semantic index starts to look like a synchronized projection of part of the SQL database.</p><p>And that requires a synchronization mechanism.</p><div><hr></div><h2>Where the current implementation stands</h2><p>The current backend already has the synchronization mechanism for semantic indexing.</p><p>It scans configured source tables in batches, extracts the configured primary key and text columns, builds an embedding payload from the text, computes hashes, calls the embedding provider in batches and upserts rows into a PostgreSQL/pgvector-backed semantic index.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kSJ-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kSJ-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!kSJ-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!kSJ-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!kSJ-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kSJ-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png" width="369" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1448,&quot;width&quot;:1086,&quot;resizeWidth&quot;:369,&quot;bytes&quot;:1024321,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kSJ-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!kSJ-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!kSJ-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!kSJ-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8297347f-97cf-49fd-9c62-546f85fbc758_1086x1448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The index is scoped by tenant and user. That is important because semantic retrieval is not a public search operation. It happens inside the user&#8217;s private context.</p><p>The current implementation also keeps the semantic index intentionally compact. Legacy payload columns are cleared when possible, and the index keeps the source key and embedding metadata rather than becoming a full copy of the source table.</p><p>That is a good first step.</p><p>But it also means that full filter pushdown is not implemented yet.</p><p>At query time, the current agent usually does this:</p><ol><li><p>Run semantic search across the configured semantic table.</p></li><li><p>Get candidate primary keys.</p></li><li><p>Build a CTE with those candidates.</p></li><li><p>Join the CTE back to the source table.</p></li><li><p>Apply SQL filters such as dates, status or exact identifiers.</p></li><li><p>Generate the final answer only from executed SQL results.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VYWL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VYWL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 424w, https://substackcdn.com/image/fetch/$s_!VYWL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 848w, https://substackcdn.com/image/fetch/$s_!VYWL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 1272w, https://substackcdn.com/image/fetch/$s_!VYWL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VYWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png" width="217" height="509.83076923076925" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1069,&quot;width&quot;:455,&quot;resizeWidth&quot;:217,&quot;bytes&quot;:67097,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VYWL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 424w, https://substackcdn.com/image/fetch/$s_!VYWL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 848w, https://substackcdn.com/image/fetch/$s_!VYWL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 1272w, https://substackcdn.com/image/fetch/$s_!VYWL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb3bd5a-329e-4c08-8336-7c6b0599f26d_455x1069.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This gives us grounding and correctness control, but not always optimal retrieval efficiency.</p><p>If the structured filter is very selective, we still pay the cost of searching the broader semantic space first. The SQL filter happens after candidate retrieval, not before it.</p><p>So the current implementation is a working hybrid pattern, but not the final optimized version.</p><div><hr></div><h2>Why not just put everything into the vector database?</h2><p>One possible reaction is:</p><p>Why not store all fields in the vector database and query everything there?</p><p>Because then we risk rebuilding a weaker database next to the real one.</p><p>SQL databases already know how to filter, join, count, sort, group, aggregate and enforce structure. Vector databases are useful, but they should not become the place where we reimplement relational semantics badly.</p><p>The more useful direction, at least for what I am exploring, is a controlled projection:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7ln7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7ln7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 424w, https://substackcdn.com/image/fetch/$s_!7ln7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 848w, https://substackcdn.com/image/fetch/$s_!7ln7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 1272w, https://substackcdn.com/image/fetch/$s_!7ln7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7ln7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png" width="404" height="303" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1086,&quot;width&quot;:1448,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:885632,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7ln7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 424w, https://substackcdn.com/image/fetch/$s_!7ln7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 848w, https://substackcdn.com/image/fetch/$s_!7ln7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 1272w, https://substackcdn.com/image/fetch/$s_!7ln7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc11f8986-b6e3-4c9c-903c-79581efc0096_1448x1086.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The projection should contain only what semantic retrieval needs:</p><ul><li><p>the embedding</p></li><li><p>the source key</p></li><li><p>enough metadata to pre-filter safely</p></li><li><p>synchronization hashes</p></li><li><p>scope and ownership fields</p></li></ul><p>Not the whole operational dataset.</p><p>This keeps a clear boundary:</p><ul><li><p>SQL remains the system of record.</p></li><li><p>The semantic index remains an acceleration and discovery layer.</p></li><li><p>The agent orchestrates both.</p></li></ul><div><hr></div><h2>The role of filter columns</h2><p>The backend already has the concept of profile-declared filter columns at the agent level.</p><p>This is useful when the user explicitly scopes a value to a field.</p><p>For example:</p><p><em>Show opportunities where customer is ACME Corp.</em></p><p>The word &#8220;<em>ACME Corp</em>&#8221; should not be treated as a broad semantic concept. It is an exact field constraint. The agent should bind it to the configured source column, such as customer_name, and use SQL.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mPxl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mPxl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 424w, https://substackcdn.com/image/fetch/$s_!mPxl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 848w, https://substackcdn.com/image/fetch/$s_!mPxl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!mPxl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mPxl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png" width="270" height="444.897466827503" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1366,&quot;width&quot;:829,&quot;resizeWidth&quot;:270,&quot;bytes&quot;:92300,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mPxl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 424w, https://substackcdn.com/image/fetch/$s_!mPxl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 848w, https://substackcdn.com/image/fetch/$s_!mPxl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 1272w, https://substackcdn.com/image/fetch/$s_!mPxl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f4389e-0012-4126-90c4-8676c6987d65_829x1366.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This avoids a common failure mode: using semantic search when the user actually gave an explicit structured condition.</p><p>However, this is different from true metadata filtering inside the semantic index.</p><p>Today, the agent can use filter hints to decide when not to run semantic search, or how to apply SQL constraints after candidates are retrieved. The next optimization step is to persist selected filterable metadata in the semantic index itself so that semantic retrieval can be narrowed before vector ranking.</p><p>That would make questions like this more efficient:</p><p>Show support cases related to authentication errors opened last week.</p><p>Instead of searching every support case ever indexed, the semantic search could first restrict the index to cases opened last week, then rank only that subset for &#8220;authentication errors&#8221;.</p><div><hr></div><h2>A possible next architecture</h2><p>The next version of the hybrid search layer could look like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZuxE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZuxE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!ZuxE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!ZuxE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!ZuxE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZuxE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png" width="335" height="446.6666666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1448,&quot;width&quot;:1086,&quot;resizeWidth&quot;:335,&quot;bytes&quot;:1006629,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZuxE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!ZuxE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!ZuxE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!ZuxE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F974809e1-ae77-4db0-9cd1-45c92421b166_1086x1448.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In a CRM example, the semantic projection for opportunities might include:</p><p><code>embedding: vector(description + notes + next_steps)<br>source_key: opportunity_id<br>source_table: opportunities<br>owner_id: structured metadata<br>account_id: structured metadata<br>stage: structured metadata<br>close_date: structured metadata<br>created_at: structured metadata<br>updated_at: structured metadata</code></p><p>Then the agent can push down filters like:</p><ul><li><p>last month</p></li><li><p>account = ACME</p></li><li><p>owner = Maria</p></li><li><p>stage = Closed Won</p></li><li><p>region = EMEA</p></li></ul><p>before ranking semantically.</p><p>That reduces noise, improves recall inside the relevant subset and avoids wasting the top-k semantic window on rows that SQL will later discard.</p><div><hr></div><h2>But synchronization becomes a real subsystem</h2><p>The moment we keep a semantic projection of SQL data, we need to answer uncomfortable questions:</p><ul><li><p>How often is the index refreshed?</p></li><li><p>Is synchronization full, incremental or event-driven?</p></li><li><p>How do we detect deleted rows?</p></li><li><p>What happens when the embedding model changes?</p></li><li><p>How do we avoid indexing stale data?</p></li><li><p>How do we handle permissions per tenant and user?</p></li><li><p>Which metadata columns are safe and useful to duplicate?</p></li><li><p>How do we avoid turning the semantic index into an uncontrolled data copy?</p></li></ul><p>The current backend already deals with some of this.</p><p>It uses content hashes to detect whether a text payload changed. It tracks the embedding model, so rows can be refreshed when the model changes. It uses a synchronization run ID and last_seen_at metadata. It only deletes stale rows after a complete source scan, avoiding unsafe deletion after partial syncs. It supports batch sizes, embedding batch sizes, dry runs, row limits, batch limits and commit intervals.</p><p>That is the kind of boring machinery that makes the architecture real.</p><p>The exciting part is &#8220;talk to your data&#8221;. The hard part is keeping the index coherent enough that the assistant does not talk to yesterday&#8217;s data by mistake.</p><div><hr></div><h2>Why this belongs inside an agent</h2><p>Hybrid SQL + semantic search could be implemented as a fixed pipeline, but an agent adds something useful: decision-making.</p><p>Not every question needs semantic search.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zwil!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zwil!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 424w, https://substackcdn.com/image/fetch/$s_!zwil!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 848w, https://substackcdn.com/image/fetch/$s_!zwil!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 1272w, https://substackcdn.com/image/fetch/$s_!zwil!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zwil!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png" width="526" height="367.04395604395603" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1016,&quot;width&quot;:1456,&quot;resizeWidth&quot;:526,&quot;bytes&quot;:156970,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zwil!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 424w, https://substackcdn.com/image/fetch/$s_!zwil!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 848w, https://substackcdn.com/image/fetch/$s_!zwil!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 1272w, https://substackcdn.com/image/fetch/$s_!zwil!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b9bba0-ba2c-4ec4-b4cf-8a48259ad4e7_2120x1479.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For example:</p><ul><li><p>&#8220;<em>Give me opportunity OPP-123</em>&#8221; should be a direct identifier lookup.</p></li><li><p>&#8220;<em>How many opportunities closed last month?</em>&#8221; should be SQL only.</p></li><li><p>&#8220;Show opportunities related to cloud migration&#8221; may need semantic search.</p></li><li><p>&#8220;<em>Show ACME opportunities related to authentication issues opened last week</em>&#8221; may need both.</p></li></ul><p>The agent&#8217;s job is not just to generate SQL. It has to decide which evidence path makes sense.</p><p>In the backend, this is handled through a query-intent plan. The agent extracts hints such as:</p><ul><li><p>base table</p></li><li><p>text columns</p></li><li><p>identifier columns</p></li><li><p>literal terms</p></li><li><p>strong literal terms</p></li><li><p>date filters</p></li><li><p>whether the user wants a count, detail list or ranking</p></li><li><p>whether semantic search should be skipped</p></li></ul><p>This is still heuristic and under active refinement, but it is already useful. It prevents some bad behavior, such as running semantic search for exact identifiers or treating a column-scoped filter as free text.</p><div><hr></div><h2>Semantic candidates are not counts</h2><p>One important lesson: a semantic candidate set is not the same thing as a complete result set.</p><p>If the semantic tool returns 30 candidates, that does not mean there are only 30 matching rows in the database.</p><p>It means:</p><p>The semantic retriever returned 30 candidate rows under the current retrieval settings.</p><p>This distinction matters for questions like:</p><p>How many opportunities are related to cloud migration?</p><p>If we simply count the top 30 semantic candidates, we are not answering the question. We are counting the retrieval window.</p><p>The current implementation carries this warning explicitly in the semantic payload: the candidate count is not a full database count unless a separate SQL count over a complete predicate is executed.</p><p>This is one of the reasons I do not want the final answer to come directly from vector search. Retrieval is evidence discovery. SQL is where exact computation should happen.</p><div><hr></div><h2>The pragmatic trade-off</h2><p>So, is hybrid SQL + semantic search inside an agent a good idea?</p><p>My current answer is:</p><p><strong>Yes, but only if semantic search is treated as candidate generation, not as the source of truth.</strong></p><p>The pattern solves a real problem: SQL agents struggle with conceptual free-text questions, and pure RAG struggles with structured computation. Combining them gives the agent a better chance of answering questions that contain both fuzzy meaning and exact constraints.</p><p>But it introduces new architectural complexity:</p><ul><li><p>semantic profiles must be designed per table;</p></li><li><p>text columns must be selected carefully;</p></li><li><p>primary keys must be stable;</p></li><li><p>synchronization must be reliable;</p></li><li><p>stale rows must be handled safely;</p></li><li><p>filter pushdown requires metadata duplication;</p></li><li><p>query planning must decide when semantic search is useful;</p></li><li><p>final answers must remain grounded in executed SQL results.</p></li></ul><p>This is not a free lunch.</p><p>It is more like adding a new implement to the data field: useful, but only if it is attached to the right machinery.</p><div><hr></div><h2>Where I am going next</h2><p>The current implementation is a first working version, not the final design.</p><p>The next area I want to optimize is filtered semantic retrieval.</p><p>The goal is to move from this:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DGUM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DGUM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 424w, https://substackcdn.com/image/fetch/$s_!DGUM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 848w, https://substackcdn.com/image/fetch/$s_!DGUM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 1272w, https://substackcdn.com/image/fetch/$s_!DGUM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DGUM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png" width="606" height="120.5303867403315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:288,&quot;width&quot;:1448,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:380380,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DGUM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 424w, https://substackcdn.com/image/fetch/$s_!DGUM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 848w, https://substackcdn.com/image/fetch/$s_!DGUM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 1272w, https://substackcdn.com/image/fetch/$s_!DGUM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6d2b7a3-877a-48d8-91e0-8249a1ee169f_1448x288.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>To this:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EV4J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EV4J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 424w, https://substackcdn.com/image/fetch/$s_!EV4J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 848w, https://substackcdn.com/image/fetch/$s_!EV4J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 1272w, https://substackcdn.com/image/fetch/$s_!EV4J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EV4J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png" width="702" height="110.53591160220995" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:228,&quot;width&quot;:1448,&quot;resizeWidth&quot;:702,&quot;bytes&quot;:318063,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198425462?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EV4J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 424w, https://substackcdn.com/image/fetch/$s_!EV4J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 848w, https://substackcdn.com/image/fetch/$s_!EV4J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 1272w, https://substackcdn.com/image/fetch/$s_!EV4J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5aa2e2a5-1b7a-49ee-acdd-7c106ade1c37_1448x228.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>That requires extending the semantic index from a pure vector + source-key structure into a controlled, synchronized semantic projection with selected filterable fields.</p><p>I do not know yet if this will be the most efficient approach in every case. There may be better options depending on the database engine, vector backend, data volume, query pattern and latency constraints.</p><p>But this is the path I am exploring now.</p><p>It keeps the database in charge of structured truth. It gives the agent a way to work with meaning. And it preserves a clear architectural boundary between search, computation and answer generation.</p><p>That boundary is where most of the interesting work is happening.</p><p>Because &#8220;talk to your data&#8221; sounds simple.</p><p>But once the questions become real, the system has to cultivate both sides of the field: the structure of the database and the semantics of human language.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.dataterreno.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Breaking Ground]]></title><description><![CDATA[Where data engineering meets AI: building tools to understand, protect, and grow your own knowledge]]></description><link>https://blog.dataterreno.com/p/breaking-ground</link><guid isPermaLink="false">https://blog.dataterreno.com/p/breaking-ground</guid><dc:creator><![CDATA[DataTerreno]]></dc:creator><pubDate>Tue, 19 May 2026 12:27:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c3d088c6-c93a-4d83-8d1e-7957a0f3aad3_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong>Why this blog exists</strong></h2><p>Data has become one of the most valuable assets any person, team or organization can own. But owning data is not the same as controlling it. And controlling it is not the same as understanding it.</p><p>That gap is where this blog starts.</p><p>Over the last few years, I have been working around data platforms, Data Lakehouse architectures, language models, conversational assistants and agentic AI. What began as an academic project - Talk to your data, a final degree project about <a href="https://arxiv.org/abs/2408.14717">Table-Augmented Generation</a> - has gradually turned into a much broader exploration: how can we make data more accessible without giving up control over it?</p><p>That question is technical, but it is also strategic.</p><p>Because every time an organization moves its data, its models, its metadata, its identity layer or its analytical workflows into someone else&#8217;s black box, it gains convenience but loses part of its autonomy. Sometimes that trade-off makes sense. Sometimes it does not. But it should always be a conscious decision.</p><p>This blog is a place to document that journey.</p><p>Not as a collection of polished marketing stories, but as a field notebook: things tested, things broken, architectures explored, design decisions, failed assumptions, small discoveries, technical trade-offs and lessons learned while building systems around data sovereignty, AI and modern analytics.</p><div><hr></div><h2>The core idea: cultivate your own knowledge</h2><p>At DataTerreno, the metaphor is simple: data is raw land.</p><p>You do not get value from land just by owning it. You need to work it. You need to prepare it, clean it, structure it, protect it and make it productive. The same happens with data.</p><p>Raw data is messy. It comes from different systems, in different formats, with different levels of quality and trust. Before it can support decisions, it needs engineering. Before it can feed AI, it needs context. Before it can become knowledge, it needs to be governed.</p><p>But there is a second part to the metaphor: the land is yours.</p><p>The point is not to hand over control of your data to intermediaries and hope for the best. The point is to build tools that allow organizations to understand, protect and grow their own knowledge without unnecessary dependencies.</p><p>That does not mean rejecting cloud, commercial platforms or managed services by default. It means avoiding blind dependency. It means keeping architectural options open. It means understanding what is happening under the hood. It means being able to decide where your data lives, who can access it, what models process it and how the results are audited.</p><p>Data sovereignty is not nostalgia for on-premises infrastructure. It is the ability to choose.</p><div><hr></div><h2>From &#8220;talk to your data&#8221; to agentic systems</h2><p>The first major milestone in this journey was my final degree project: an assistant capable of interacting with structured data using natural language.</p><p>The original idea was simple to explain but difficult to implement well: what if a user could ask a database a question in plain language and receive a useful answer without knowing SQL, table schemas or query engines?</p><p>Classic Text-to-SQL is part of the answer, but not the whole answer. Translating a question into SQL works well when the question maps cleanly to relational operations. But real questions are often messier. They may require interpretation, classification, summarization, external knowledge or multi-step reasoning.</p><p>That is where Table-Augmented Generation becomes interesting.</p><p>TAG connects three worlds:</p><ol><li><p>The precision of database systems.</p></li><li><p>The semantic flexibility of language models.</p></li><li><p>The conversational interface that makes data accessible to more people.</p></li></ol><p>Instead of treating the language model as a magic box, the system must orchestrate it with tools: inspect schemas, generate queries, execute them, validate results and compose a final answer grounded in real data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nj0L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nj0L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 424w, https://substackcdn.com/image/fetch/$s_!nj0L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 848w, https://substackcdn.com/image/fetch/$s_!nj0L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 1272w, https://substackcdn.com/image/fetch/$s_!nj0L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nj0L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png" width="1448" height="1086" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1086,&quot;width&quot;:1448,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1217361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198393116?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nj0L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 424w, https://substackcdn.com/image/fetch/$s_!nj0L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 848w, https://substackcdn.com/image/fetch/$s_!nj0L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 1272w, https://substackcdn.com/image/fetch/$s_!nj0L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4065a3b9-3a09-4a71-94a7-42221050464a_1448x1086.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This naturally led to agentic patterns such as <a href="https://arxiv.org/abs/2210.03629">ReAct</a>, where the model alternates between reasoning and acting. In practice, that means the assistant can think about the task, decide to inspect a schema, execute SQL, observe the result, correct itself if needed and then produce an answer.</p><p>That loop is where many interesting problems appear: hallucinations, invalid SQL, schema ambiguity, context limits, tool errors, infinite loops, unclear user intent, permission boundaries and evaluation difficulties.</p><p>Those problems are exactly the kind of material this blog will explore.</p><div><hr></div><h2>The platform underneath matters</h2><p>A conversational assistant is only as useful as the data platform behind it.</p><p>That is why another line of work has focused on building and orchestrating an on-premises Data Lakehouse platform: object storage, metadata services, query engines, identity, permissions, deployment automation and operational visibility.</p><p>The goal is not to rebuild a hyperscaler from scratch. The goal is to understand which components are needed to run a modern data platform under your own control.</p><p>A typical pattern looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EGug!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EGug!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!EGug!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!EGug!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!EGug!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EGug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fae61349-9868-4521-8962-0a108b321d40_1600x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163440,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://dataterreno.substack.com/i/198393116?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EGug!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 424w, https://substackcdn.com/image/fetch/$s_!EGug!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 848w, https://substackcdn.com/image/fetch/$s_!EGug!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!EGug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffae61349-9868-4521-8962-0a108b321d40_1600x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A sovereign architecture is not just about running services on-premises. It is about keeping data, metadata, identity, permissions, inference and auditability under conscious control.</p><p>The hard part is not deploying containers. The hard part is making the system coherent:</p><ul><li><p>identity should be centralized;</p></li><li><p>permissions should follow the user;</p></li><li><p>data access should be auditable;</p></li><li><p>models should not bypass governance;</p></li><li><p>assistants should not invent answers;</p></li><li><p>generated artifacts should be traceable;</p></li><li><p>and the platform should remain portable.</p></li></ul><p>This is where data architecture and AI architecture stop being separate disciplines.</p><div><hr></div><h2>What I will write about</h2><p>This blog will be mainly technical, but not purely theoretical. The idea is to write from the trenches: what I am building, testing, debugging and learning.</p><p>Some of the topics I expect to cover:</p><p><strong>Data sovereignty and architecture</strong>. What it means to design platforms that preserve control, portability and independence. Where on-premises makes sense, where cloud makes sense, and where hybrid models become unavoidable.</p><p><strong>Data Lakehouse platforms</strong>. Storage, compute, metadata, federation, open formats, Kubernetes deployments, identity integration, observability and operational patterns.</p><p><strong>AI assistants over data</strong>. TAG, Text-to-SQL, RAG, semantic search, SQL agents, tool calling, query validation, result grounding and conversational interfaces.</p><p><strong>Agentic AI</strong>. ReAct-style loops, supervisors, worker agents, tool registries, traces, memory, routing, failure handling and the difference between a demo agent and a reliable system.</p><p><strong>Evaluation</strong>. How to test assistants that interact with data. Benchmarks, difficulty levels, deterministic validation, qualitative metrics and why &#8220;it answered something plausible&#8221; is not good enough.</p><p><strong>Security and governance</strong>. Authentication, authorization, guardrails, audit trails, tenant isolation, data permissions and the risks of connecting LLMs directly to enterprise data.</p><p><strong>Open-source and local AI</strong>. Running models with vLLM, using open models, deploying inference services locally, embedding models, GPU constraints and the trade-offs between control and convenience.</p><p>And probably many unexpected things along the way.</p><p>Because every real project starts with a clean diagram and ends with logs, edge cases and uncomfortable questions.</p><div><hr></div><h2>The tone of this blog</h2><p>This will not be a blog about hype.</p><p>AI is useful, but it is not magic. LLMs can reason surprisingly well in some contexts and fail absurdly in others. Agents can automate workflows, but they can also loop, hallucinate or call the wrong tool with total confidence. Data platforms can provide powerful abstractions, but they still depend on good engineering.</p><p>So the tone here will be pragmatic.</p><p>When something works, I will explain why I think it works. When something fails, I will try to understand the failure. When a design decision has trade-offs, I will make them explicit. When I am not sure, I will say so.</p><p>The objective is not to sell a perfect architecture. The objective is to learn in public.</p><div><hr></div><h2>A starting point</h2><p>This blog starts from one conviction:</p><p>Organizations should be able to extract value from their data without losing control over it.</p><p>That requires more than technology. It requires architecture, governance, security, usability and a clear understanding of what AI can and cannot do.</p><p>But technology matters. Tools shape possibilities. If the only practical way to use AI over your data is to send everything to an external platform, then sovereignty becomes theoretical. If only technical experts can access information, then data-driven decision-making remains limited. If assistants cannot be audited, they cannot be trusted.</p><p>So the challenge is clear: build systems that make data easier to use while keeping it under control.</p><p>That is the ground this blog will cultivate.</p>]]></content:encoded></item></channel></rss>